<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Educ.</journal-id>
<journal-title>Frontiers in Education</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Educ.</abbrev-journal-title>
<issn pub-type="epub">2504-284X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/feduc.2023.1092714</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Education</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Lacking measurement invariance in research self-efficacy: Bug or feature?</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Klieme</surname> <given-names>Katrin Ellen</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2053826/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Schmidt-Borcherding</surname> <given-names>Florian</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/2062326/overview"/>
</contrib>
</contrib-group>
<aff><institution>Research Group for Educational Psychology and Empirical Educational Sciences, Faculty 12: Pedagogy and Educational Sciences, University of Bremen</institution>, <addr-line>Bremen</addr-line>, <country>Germany</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Lan Yang, The Education University of Hong Kong, Hong Kong SAR, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Gregory Siy Ching, Fu Jen Catholic University, Taiwan; Juliette Lyons-Thomas, Educational Testing Service, United States</p></fn>

<corresp id="c001">&#x0002A;Correspondence: Katrin Ellen Klieme &#x02709; <email>klieme&#x00040;uni-bremen.de</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Assessment, Testing and Applied Measurement, a section of the journal Frontiers in Education</p></fn></author-notes>
<pub-date pub-type="epub">
<day>06</day>
<month>02</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>8</volume>
<elocation-id>1092714</elocation-id>
<history>
<date date-type="received">
<day>08</day>
<month>11</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>12</day>
<month>01</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Klieme and Schmidt-Borcherding.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Klieme and Schmidt-Borcherding</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract>
<p>Psychological factors play an important role in higher education. With respect to students&#x00027; understanding of scientific research methods, research self-efficacy (RSE) has been studied as a core construct. However, findings on antecedents and outcomes of RSE are oftentimes heterogeneous regarding both its theoretical and empirical structures. The present study helps disentangle these findings by (a) establishing and validating an integrated, multi-dimensional assessment of RSE and (b) introducing a developmental perspective on RSE by testing the impact of the disciplinary context and academic seniority on both mean level and latent structure of RSE. The construct validity of the new measure was supported based on RSE assessments of 554 German psychology and educational science students. Relations to convergent and discriminant measures were as expected. Measurement invariance and LSEM analyses revealed significant differences in latent model parameters between most sub-groups of training level and disciplinary context. We discuss our findings of measurement non-invariance as a feature rather than a bug by stressing a process-oriented perspective on RSE. In this regard, we conclude potential future directions of research and RSE theory development, alongside implications for methods education practice in higher education.</p></abstract>
<kwd-group>
<kwd>research self-efficacy</kwd>
<kwd>assessment</kwd>
<kwd>validity</kwd>
<kwd>measurement invariance</kwd>
<kwd>differentiation</kwd>
<kwd>MGCFA</kwd>
<kwd>local structural equation modeling</kwd>
<kwd>research training</kwd>
</kwd-group>
<counts>
<fig-count count="2"/>
<table-count count="2"/>
<equation-count count="0"/>
<ref-count count="58"/>
<page-count count="14"/>
<word-count count="12038"/>
</counts>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1. Introduction and theory</title>
<p>Teaching the understanding and application of scientific research methods is a central aim of almost every empirical higher education program. Apart from mere research knowledge and skills, psychological factors play an important role in student development. At this, research self-efficacy (RSE) was defined as students&#x00027; &#x0201C;confidence in successfully performing tasks associated with conducting research&#x0201D; (Forester et al., <xref ref-type="bibr" rid="B15">2004</xref>, p. 4). Interestingly, RSE is not only an outcome of academic education itself but also a predictor of other desirable outcomes of university education. According to social cognitive career theory (Lent et al., <xref ref-type="bibr" rid="B34">1994</xref>), these are interest in research, research productivity, or career choice (Livin&#x00163;i et al., <xref ref-type="bibr" rid="B35">2021</xref>).</p>
<p>The available international literature on RSE in higher education is relatively broad. However, the proposed theoretical and empirical structures of RSE are oftentimes inconsistent. First, theoretical conceptions of RSE vary depending on the number and nature of sub-factors (Forester et al., <xref ref-type="bibr" rid="B15">2004</xref>). Second, some measures even show discrepancies between theoretically posed and empirically found structures within themselves (Forester et al., <xref ref-type="bibr" rid="B15">2004</xref>; Bieschke, <xref ref-type="bibr" rid="B5">2006</xref>), thus calling into question their validity. The present study helps disentangle these findings by (a) establishing and validating an integrated, multi-dimensional measure of RSE and (b) introducing a developmental perspective on RSE by testing the impact of the disciplinary context and academic seniority on both structure and level of RSE. Such a developmental perspective on RSE might contribute to explaining the current heterogeneous landscape and, thus, enable a systematic and coherent investigation of RSE antecedents, development, and outcomes. Such findings can help improve methods education by understanding differentiated student needs linked to their individual background or facilitating diagnostics for mentoring. On a larger scale, a developmental perspective on RSE can help develop and evaluate evidence-based learning settings, which take differentiating effects on RSE into account. Pointedly fostering RSE facilitates sustainable educational outcomes such as interest and productivity, beyond mere knowledge and skill generation.</p>
<sec>
<title>1.1. Research as a specific domain of self-efficacy beliefs in higher education</title>
<p>Perceived self-efficacy is a loose hierarchical construct and assumes that a general self-efficacy factor is generalized from domain-specific self-efficacy beliefs (Bandura, <xref ref-type="bibr" rid="B2">2006</xref>). All specific self-efficacy beliefs are related to general self-efficacy. However, since specific self-efficacy beliefs develop from domain-specific factors, they also differ from the general factor, as well as from each other (Bandura, <xref ref-type="bibr" rid="B1">1997</xref>). Specific influences are, for example, epistemological beliefs about the domain at hand (Mason et al., <xref ref-type="bibr" rid="B37">2013</xref>), the value a person ascribes to this specific domain (Finney and Schraw, <xref ref-type="bibr" rid="B13">2003</xref>; Bandura, <xref ref-type="bibr" rid="B2">2006</xref>), and, most importantly, domain-specific mastery experience (Bandura, <xref ref-type="bibr" rid="B2">2006</xref>). Thus, self-efficacy varies intra-individually based on its different domains and the formation of a specific realm. Each specific self-efficacy, thus, needs to be given individual in-depth theoretical and empirical attention. Here, we focus on research self-efficacy, a sub-domain of academic self-efficacy, in the context of empirical social sciences.</p>
</sec>
<sec>
<title>1.2. Challenges in research self-efficacy theory and assessment</title>
<p>Despite its importance, the RSE theory is not well-developed yet, and the theoretical structures of RSE that have been proposed so far are quite varied (Forester et al., <xref ref-type="bibr" rid="B15">2004</xref>). Most scholars assume a second-order factor structure (Phillips and Russell, <xref ref-type="bibr" rid="B45">1994</xref>; Bieschke et al., <xref ref-type="bibr" rid="B6">1996</xref>; O&#x00027;Brien et al., <xref ref-type="bibr" rid="B42">1998</xref>; Bieschke, <xref ref-type="bibr" rid="B5">2006</xref>). Commonly, the sub-factors represent the different stages in the research process, such as (1) literature review and development of a research question, (2) research design and data collection, (3) data analysis and Interpretation, and (4) communication of results to the scientific community. Following Bandura&#x00027;s (<xref ref-type="bibr" rid="B2">2006</xref>) suggestion on self-efficacy assessment, the respective items that are assumed to assess each sub-factor list concrete research tasks (e.g., &#x0201C;develop researchable questions,&#x0201D; &#x0201C;obtain appropriate subjects,&#x0201D; &#x0201C;chose an appropriate method of data analysis,&#x0201D; and &#x0201C;write a thesis&#x0201D;). However, apart from this general assumption, the number and nature of these sub-factors vary (Forester et al., <xref ref-type="bibr" rid="B15">2004</xref>) between scholars.</p>
<p>Prominent RSE measures are the Research Self-Efficacy Scale (RSES, Bieschke et al., <xref ref-type="bibr" rid="B6">1996</xref>), the Self-Efficacy in Research Measure (SERM, Phillips and Russell, <xref ref-type="bibr" rid="B45">1994</xref>), the Research Attitudes Measure (RAM, O&#x00027;Brien et al., <xref ref-type="bibr" rid="B42">1998</xref>), and the Research Self-Efficacy Scale (Holden et al., <xref ref-type="bibr" rid="B21">1999</xref>). These have been employed in various studies, as just recently mentioned in a meta-analysis by Livin&#x00163;i et al. (<xref ref-type="bibr" rid="B35">2021</xref>). While these measures are all valuable to measure RSE within their perspective, the conceptual differences between them invite skepticism on whether results drawn from studies that employ each instrument can be compared and pooled meaningfully.</p>
<p>Bieschke et al. (<xref ref-type="bibr" rid="B6">1996</xref>) propose four factors based on principal components analysis of items that were generated to represent the whole research process. The factors represent self-efficacy regarding research conceptualization, early tasks, research implementation, and presenting the results.</p>
<p>Similarly, Phillips and Russell (<xref ref-type="bibr" rid="B45">1994</xref>) also propose a four-factor structure of RSE, namely self-efficacy regarding research design, practical, writing, and quantitative skill. However, this structure is based on previous results of a principal components analysis of research skills employed by Royalty and Reising (<xref ref-type="bibr" rid="B53">1986</xref>). The respective items are drawn in part from this skill list, as well as from additional theoretical reflections to represent the four factors. Therefore, the content domain that items are sampled from differs from the one targeted by Bieschke et al. (<xref ref-type="bibr" rid="B6">1996</xref>), as do the proposed factor qualities.</p>
<p>Adding to the confusion, O&#x00027;Brien et al. (<xref ref-type="bibr" rid="B42">1998</xref>) propose a six-factor structure of RSE that is based on a PCA of items written to represent the whole research process. These six factors are self-efficacy regarding discipline and intrinsic motivation, analytical skills, preliminary conceptualization skills, writing skills, application of ethics and procedures, and contribution and utilization of resources. Thus, the theoretically targeted content domain that items were drawn from (the whole research process) was the same as the domain targeted by Bieschke et al. (<xref ref-type="bibr" rid="B6">1996</xref>), but empirical analyses yielded a different number of sub-factors (six vs. four).</p>
<p>Concluding, in line with differences in theoretical conceptualization, these measures show discrepancies in the content and factor structure, not only between measures but even within themselves when comparing results from different studies (Forester et al., <xref ref-type="bibr" rid="B15">2004</xref>; Bieschke, <xref ref-type="bibr" rid="B5">2006</xref>). Such inconsistencies call into question both the conceptual and measurement validity and call for the advancement and integration of RSE measurement. Subsequently, enhanced RSE measurement enables coherent research that produces valid and comparable results.</p>
</sec>
<sec>
<title>1.3. Advances in research self-efficacy assessment: Measurement integration</title>
<p>A first step to resolve the heterogenous measurement landscape of RSE was initiated in 2004 by Forester and colleagues. The authors conducted a common factor analysis of 107 items from the three prominent U.S. American RSE measures, the SERM (Phillips and Russell, <xref ref-type="bibr" rid="B45">1994</xref>), the RSES (Bieschke et al., <xref ref-type="bibr" rid="B6">1996</xref>), and the RAM (O&#x00027;Brien et al., <xref ref-type="bibr" rid="B42">1998</xref>). Their analyses provided &#x0201C;information about the dimension of RSE that is not detectable in an analysis of respondents to just one instrument&#x0201D; (Forester et al., <xref ref-type="bibr" rid="B15">2004</xref>, p. 6), thus laying the ground for advances in RSE measurement.</p>
<p>Based on EFA results from Forester et al. (<xref ref-type="bibr" rid="B15">2004</xref>), the Assessment of Self-Efficacy in Research Questionnaire (ASER, Klieme, <xref ref-type="bibr" rid="B31">2021</xref>) was recently developed as a progression of RSE measurement and theory. Thus, previous progress achieved by various scholars was taken into account instead of starting from scratch. The ASER empirically finds a comprehensive understanding of RSE operationalization by integrating items from the existing heterogeneous approaches. The ASER is explicitly designed for Bachelor and Master students and is available in both German and English versions. It, thus, lays the groundwork for cross-national research due to its international developmental context. A detailed description of item selection based on EFA factor loadings and psychometric properties is provided by Klieme (<xref ref-type="bibr" rid="B31">2021</xref>).</p>
</sec>
<sec>
<title>1.4. Advances in research self-efficacy theory: Differentiation hypothesis</title>
<p>Due to the promising but heterogeneous research landscape of RSE, further investigation seems beneficial: Is this heterogeneity (a) based on invalid measurement, or can it (b) be explained by a theory on RSE development? The measurement issue was tackled by the ASER development and validation (i.e., eliminating a bug). The present article focuses on a theory-based, developmental account for RSE heterogeneity (i.e., identifying a feature) by employing analyses of measurement (in)variance, taking a non-traditional approach.</p>
<p>Measurement invariance (MI) analyses are employed to test whether a measure&#x00027;s manifest scores represent the same latent construct in different groups, for example, RSE in psychology and educational science students. Commonly, nested models are fitted through multi-group confirmatory factor analyses (MGCFA, J&#x000F6;reskog, <xref ref-type="bibr" rid="B25">1971</xref>), increasingly constraining model parameters to be equal&#x02014;i.e., invariant&#x02014;across groups. Coined by Meredith (<xref ref-type="bibr" rid="B38">1993</xref>), these models represent the configural (equal factor structure), metric or weak (equal factor loadings), scalar or strong factorial (equal item intercepts), or strict (equal item residual variances) invariance of model parameters. MI should ideally be scalar at least for a valid comparison of manifest test scores (Fischer and Karl, <xref ref-type="bibr" rid="B14">2019</xref>). Usually, a lack of MI is regarded as a weakness in measurement and eliminating or reducing it is an important endeavor during test development.</p>
<p>Another approach to dealing with a lack of MI might be to systematically probe potential reasons. One reason for the fractured picture of research self-efficacy might be that results stem from unidentified heterogeneous populations. If measurement variance was to occur between sub-groups of students that have not yet been recognized explicitly, the heterogeneous results might be systematic. Indeed, Fischer and Karl (<xref ref-type="bibr" rid="B14">2019</xref>) urge researchers to value non-invariance findings the same as invariance findings. Such findings might help us understand heterogeneous empirical results in the latent structure estimation of a malleable construct: if the &#x0201C;true&#x0201D; latent structure simply is heterogeneous, so should our empirical estimations. There are at least two dimensions where structural differentiation effects should be expected in RSE: students&#x00027; training level in research methods and the different roles and/or amounts of specific methodologies in an academic discipline.</p>
<sec>
<title>1.4.1. Training level</title>
<p>The first possible differentiating effect for RSE is the training level. Self-efficacy beliefs stem from mastery experience (Bandura, <xref ref-type="bibr" rid="B2">2006</xref>). Hence, it can be reasoned that the amount of methods training affects RSE, as expanded methods training allows for extended mastery as well as failure experience&#x02014;a differentiating effect in student self-efficacy. Such experiences of mastery or failure are particularly prevalent in hands-on training settings. Oftentimes, methods training in Bachelor programs is rather theoretical, with hands-on experience increasing in graduate training. Still, any investigation of RSE development and potential differentiation effects should cover all levels of higher education.</p>
<p>First, apart from mastery experience, self-efficacy is affected by epistemological beliefs and the value a person ascribes to the respective domain (Bandura, <xref ref-type="bibr" rid="B2">2006</xref>), namely research. These two factors are probably stressed in undergraduate training already. Once in specific methods classes, but also in subject-matter classes by communicating the relevance of research for theory development. Second, undergraduate research experiences have been employed increasingly over the past years. The heft of this development is mirrored by the recent publication of the Cambridge Handbook of Undergraduate Research (Mieg et al., <xref ref-type="bibr" rid="B39">2022</xref>) which provides an overview of theoretical approaches as well as practice examples in diverse disciplines and from countries across all continents.</p>
<p>Consequently, from their first semester on, students can be exposed to forces that shape their RSE. These forces may increase and specialize as training advances from undergraduate to graduate training. The amount of training might not only affect mean levels of RSE (Livin&#x00163;i et al., <xref ref-type="bibr" rid="B35">2021</xref>) but also engender differentiation and specification of self-efficacy beliefs regarding the various research tasks and the way they constitute students&#x00027; self-efficacy in this domain. As a consequence, variation in construct structure might be interpreted as a conceptual change of research in the process of methods education. For example, Rochnia and Radisch (<xref ref-type="bibr" rid="B51">2021</xref>) argue that in educational contexts, learning implies change over time (hopefully), both regarding the mean level and construct structure. Thus, measurement non-invariance between training levels does not necessarily indicate low measurement validity (i.e., a bug), but rather a differentiation of a concept due to learning (i.e., a feature; Putnick and Bornstein, <xref ref-type="bibr" rid="B47">2016</xref>; Rochnia and Radisch, <xref ref-type="bibr" rid="B51">2021</xref>). Delineating the amount of measurement (in-) variance might help to understand the intra-individual differentiation of research as a specific domain of self-efficacy across the training level.</p>
</sec>
<sec>
<title>1.4.2. Discipline</title>
<p>A second possible differentiating effect is the research culture in an academic discipline. Although most university training is comprised of research classes, their focus on specific methodology may vary between disciplines, partly due to differences in their genesis and research targets. For example, psychology and educational science are both empirical social sciences and share some overlap (e.g., educational psychology). However, even disciplines that appear akin at first sight may differ regarding their methodological gist. In German academics, a traditional focus on qualitative methods in educational science still holds strong today and qualitative methods are employed and refined in current research, whereas psychological research employs a wider range of rather advanced quantitative methods. Consequently, most German undergraduate methods training in psychology focus almost exclusively on quantitative methods, while methods training in educational science stress qualitative methods and vary much more across universities regarding the extent of undergraduate methods training. These and other potential disciplinary differences may foster different value perceptions and epistemological beliefs regarding research in students, which affect their self-efficacy beliefs (Bandura, <xref ref-type="bibr" rid="B2">2006</xref>; Mason et al., <xref ref-type="bibr" rid="B37">2013</xref>).</p>
<p>One may then ask how broadly RSE can be assessed invariantly across disciplines such as psychology and educational science? Otherwise put, do differences between disciplines evoke differentiated development of RSE in their students?</p>
</sec>
<sec>
<title>1.4.3. Discipline by training interaction</title>
<p>Disciplines might not only differ in the actual tasks or research methods but also in the emphasis given to a discipline&#x00027;s methods. As a consequence, if methods education is emphasized stronger in one discipline than in another, differences between disciplines might increase over time as students are socialized into the research culture of their respective programs. This might lead to an interaction effect of discipline and time in the program, as socialization and differences in research-specific mastery experiences increase over time.</p>
</sec>
</sec>
<sec>
<title>1.5. Validity of empirical results</title>
<p>The main focus of the study is to account for RSE heterogeneity. Nevertheless, the validity of measurement is an indispensable prerequisite for any research that aims to disentangle the heterogeneous findings that float around the RSE research realm. Because the ASER is a relatively new measure, we will delineate validity evidence before addressing the differentiation hypotheses central to this article.</p>
<sec>
<title>1.5.1. Construct validity</title>
<p>This study was conducted in Germany with a German-speaking sample.<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> Thus, same-construct measures used for convergent validity estimation should ideally be worded and validated in German so that they provide a legitimate anchor for the ASER items. Unfortunately, validated measures of RSE are missing at the Bachelors&#x00027; and Masters&#x00027; levels in the German-speaking realm. Lachmann et al. (<xref ref-type="bibr" rid="B33">2018</xref>) developed an instrument that assesses RSE in Ph.D. graduates and targets respective attainments that are not adequate for Bachelors&#x00027; and Masters&#x00027; levels (e.g., &#x0201C;I can build cooperations with central researchers in my field&#x0201D;). In younger students, RSE has mostly been included rather as a side note with unvalidated <italic>post-hoc</italic> items, or with a divergent structural focus (e.g., Gess et al., <xref ref-type="bibr" rid="B17">2018</xref>; Pfeiffer et al., <xref ref-type="bibr" rid="B44">2018</xref>). Still, items from these two studies seem to be the best option in selecting convergent validity items: They do assess RSE specifically and they were developed to assess German students at a pre-doctoral level.</p>
<p>Estimating the discriminant validity of the ASER requires constructs that are theoretically and empirically related to, but still distinct from RSE (Clark and Watson, <xref ref-type="bibr" rid="B10">2019</xref>) to prevent a jangle fallacy.<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> Those constructs are general self-efficacy (Bandura, <xref ref-type="bibr" rid="B2">2006</xref>) and academic self-concept. While self-concept is rather general and past-informed, self-efficacy is more task-specific and future-oriented. However, both target self-beliefs (Choi, <xref ref-type="bibr" rid="B9">2005</xref>). A further theoretically and empirically positive relation to RSE is given by attitudes toward research (Kahn and Scott, <xref ref-type="bibr" rid="B29">1997</xref>; Livin&#x00163;i et al., <xref ref-type="bibr" rid="B35">2021</xref>).</p>
<p>Relations between RSE and neuroticism have not yet been tested empirically, but can be reasoned to be negative, as low self-esteem and a heightened sensitivity toward failure (Zhao et al., <xref ref-type="bibr" rid="B58">2010</xref>) impede mastery experience, which is a core requisite for acquiring (research) self-efficacy (Bandura, <xref ref-type="bibr" rid="B2">2006</xref>). Indeed, negative relations with a neuroticism that have been found for academic self-efficacy (Stajkovic et al., <xref ref-type="bibr" rid="B54">2018</xref>) support this hypothesis regarding research self-efficacy.</p>
<p>Research self-efficacy construct validity is also supported by a lack of relation with constructs that are theoretically unrelated to RSE. Correspondingly, several authors report non-significant near-null correlations between agreeableness and academic self-concept (Marsh et al., <xref ref-type="bibr" rid="B36">2006</xref>) or self-efficacy regarding college credits (De Feyter et al., <xref ref-type="bibr" rid="B11">2012</xref>). Theoretically reasoned, the tendency to help others or not to be harsh would not affect a person&#x00027;s confidence in performing research tasks. Thus, based on theoretical reflections and considerations of empirical findings on the relation between agreeableness and other self-efficacies, a lack of relation between RSE and agreeableness would support RSE construct validity.</p>
</sec>
<sec>
<title>1.5.2. Test bias regarding gender</title>
<p>Test bias regarding gender will be explored in order to delineate whether the complete sample can be used for our analyses. Even today, the ratio of female and male students presumably differs between programs. It is, therefore, important to detect potential test bias before all students are analyzed in a common model. Meta-analytic results on gender mean differences in academic self-concept show an overall effect size of 0.08 favoring male students, specifically regarding mathematics and social-sciences self-efficacy (Huang, <xref ref-type="bibr" rid="B22">2013</xref>). In contrast, in a meta-analysis on the relation specifically between research self-efficacy and gender, Livin&#x00163;i et al. (<xref ref-type="bibr" rid="B35">2021</xref>) report non-significant differences between male and female students. However, those (non-) differences refer to manifest mean scores and can only be validly interpreted in unbiased measurement.</p>
<p>In addition, if MI can be established between female and male students, this indicates that the ASER is generally fit for valid invariant measurement. Following, a lack of MI between certain groups, as tested under the differentiation hypothesis, can be attributed to real group differences in measurement and probably does not stem from the weakness of the instrument.</p>
</sec>
</sec>
<sec>
<title>1.6. Research purpose and hypotheses</title>
<p>Taken together, the aim of this study is pursued in two steps. First, the ASER will be evaluated as an integrative assessment progression, testing construct validity in the nomological net and gender bias. This way, the empirical usefulness of the ASER as a measure of RSE and the validity of the following analyses can be supported. Second, potential differentiating effects will be tested through measurement invariance (MI) analyses. These analyses will investigate potential reasons for the heterogeneous results reported in the literature so far. In particular, the following hypotheses will be tested in order to evaluate the ASER validity (hypotheses 1&#x02013;4) and to identify a structural differentiation of the RSE construct (hypotheses 5&#x02013;7).</p>
<sec>
<title>1.6.1. ASER validity</title>
<p>H1&#x02014;Convergent validity: Satisfactory fit of a comprehensive measurement model of RSE including items from different RSE measures is expected.</p>
<p>H2a&#x02013;e&#x02014;Discriminant validity: RSE will display moderate to high correlations with general self-efficacy, academic self-concept, research attitudes, and neuroticism (negative relation). The relation between agreeableness and RSE will be non-significant.</p>
<p>H3&#x02014;Construct validity: Convergent validity coefficients will be higher than discriminant validity coefficients.</p>
<p>H4&#x02014;Test bias: MI is expected between female and male students. Results will indicate whether further analyses should be conducted for both genders together or separately.</p>
</sec>
<sec>
<title>1.6.2. Structural differentiation</title>
<p>H5: Limited measurement invariance is expected between different training levels.</p>
<p>H6: Limited measurement invariance is expected between psychology and educational science students.</p>
<p>H7: An interaction effect of training level and discipline on the level of MI is expected.</p>
</sec>
</sec>
</sec>
<sec id="s2">
<title>2. Method</title>
<sec>
<title>2.1. Sampling and data preparation</title>
<p>Between June 2019 and January 2020, 648 students of psychology and educational science from various German universities filled out the questionnaires either during a lecture as a paper&#x02013;pencil assessment, or online. Informed consent was confirmed by all participants before starting the questionnaire.</p>
<p>The initial data were prepared for analysis. Missing values were only examined for items on the psychological scales. Fifty-two cases with &#x0003E;10% (<italic>N</italic> = 2) missing values on ASER items were excluded from further analyses. The remaining data set (<italic>N</italic> = 596) contained 0.20% missing values on ASER items. No single ASER item showed &#x02265;1% missing values. Missing value analysis suggested that data were missing completely at random (MCAR) according to a non-significant Little&#x00027;s MCAR test [<inline-formula><mml:math id="M1"><mml:msubsup><mml:mrow><mml:mi>&#x003C7;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>345</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> = 360.12, <italic>p</italic> &#x0003C; 0.10]. Thus, EM imputation of missing values was performed (Tabachnick and Fidell, <xref ref-type="bibr" rid="B55">2013</xref>).</p>
<p>Univariate and multivariate outliers were calculated for each item individually on the ASER and for scale scores for the other constructs. Twenty-five cases with univariate outliers were found through <italic>z</italic>-scores (<italic>z</italic> &#x0003E; |3.11|) so that five cases with more than two extreme scores were deleted, but cases with one or two extreme scores were kept in the sample (Tabachnick and Fidell, <xref ref-type="bibr" rid="B55">2013</xref>). Based on Cook&#x00027;s distance of &#x0003C; 0.03 for all variables as a more robust indicator of multivariate outliers, none were found in the sample.</p>
<p>Last, 37 students in the sample turned out to be neither psychology nor educational science majors in Bachelor or Master programs and were excluded from further analyses. Thus, the final sample comprised 554 students with a mean age of 23.72 years (<italic>SD</italic> = 4.53), of which 62% identified as female students, 33% as male students, 4% as non-binary, and 1% did not indicate their gender. Participants were sampled from different programs in psychology (56.5%) and educational science (43.5%) at different universities in Germany (48% from Frankfurt a. M., 44% from Bremen, and 8% from others like Jena, Kiel, or Freiburg PH). In this study, 68% were Bachelor students and 32% were Master students with a mean of 5.87 (<italic>SD</italic> = 4.22) total cumulated semesters at the university, including potential Bachelor and Master programs taken.</p>
</sec>
<sec>
<title>2.2. Measures</title>
<p>Research self-efficacy was assessed by the ASER, comprising 19 items. Students rated their &#x0201C;confidence in successfully performing the following tasks&#x0201D; on a 0 (&#x0201C;not at all confident&#x0201D;) to 10 (&#x0201C;completely confident&#x0201D;) Likert scale. Internal consistencies were good with Cronbach&#x00027;s &#x003B1; = 0.94 and McDonald&#x00027;s &#x003C9;<sub><italic>h</italic></sub> = 0.77 for a general factor model (all ASER items). Parallel analysis suggested a three-factor model for which McDonald&#x00027;s &#x003C9;<sub><italic>t</italic></sub> was 0.95 (sub-factors and items see <xref ref-type="table" rid="T1">Table 1</xref>). McDonald&#x00027;s &#x003C9; is interpreted similarly to Cronbach&#x00027;s &#x003B1; but allows for different item loadings on the scale factor instead of assuming all loadings to be equal (as does Cronbach&#x00027;s &#x003B1;). Following McDonald&#x00027;s &#x003C9;, ASER sub-scales should be used whenever possible, but a general RSE one-factor model is also reliable. In order to estimate ASER&#x00027;s convergent validity, an additional assessment of RSE was realized with 12 same-construct items: Nine items originally developed by Gess et al. (<xref ref-type="bibr" rid="B17">2018</xref>, e.g., &#x0201C;Analyze data qualitatively, even if I have never used this specific method before&#x0201D;) and three items developed by Pfeiffer et al. (<xref ref-type="bibr" rid="B44">2018</xref>, e.g., &#x0201C;Plan a research project&#x0201D;).</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>ASER items and CFA parameter estimations of the three-factor ASER model.</p></caption>
<table frame="box" rules="all">
<thead><tr style="background-color:#919498;color:#ffffff">
<th valign="top" align="left"><bold>Item</bold></th>
<th valign="top" align="left"><bold>German wording</bold></th>
<th valign="top" align="left"><bold>English wording</bold></th>
<th valign="top" align="center" colspan="3"><bold>Estimated factor parameters</bold></th>
</tr>
</thead>
<tbody>
<tr style="background-color:#919498;color:#ffffff">
<td/>
<td/>
<td/>
<td valign="top" align="center" colspan="3">&#x003BB;</td>
</tr>
 <tr style="background-color:#919498;color:#ffffff">
<td/>
<td/>
<td/>
<td valign="top" align="center"><bold>I</bold></td>
<td valign="top" align="center"><bold>II</bold></td>
<td valign="top" align="center"><bold>III</bold></td>
</tr> <tr>
<td valign="top" align="left">ASER11theory</td>
<td valign="top" align="left">Schl&#x000FC;ssig begr&#x000FC;ndete Forschungsideen ausarbeiten</td>
<td valign="top" align="left">Develop a logical rationale for your particular research idea.</td>
<td valign="top" align="center">0.81</td>
<td/>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER12theory</td>
<td valign="top" align="left">Den Diskussionsteil f&#x000FC;r meine Abschlussarbeit schreiben</td>
<td valign="top" align="left">Writing a discussion section for my thesis</td>
<td valign="top" align="center">0.79</td>
<td/>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER6theory</td>
<td valign="top" align="left">Einleitung, Theorieteil und Diskus-sionsteil meiner Abschlussarbeit schreiben</td>
<td valign="top" align="left">Write the introduction, literature review, and discussion for my thesis</td>
<td valign="top" align="center">0.77</td>
<td/>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER18theory</td>
<td valign="top" align="left">Selbstst&#x000E4;ndig einen Forschungsartikel schreiben</td>
<td valign="top" align="left">Write a research article on my own</td>
<td valign="top" align="center">0.73</td>
<td/>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER17theory</td>
<td valign="top" align="left">Geplante Forschungsideen begr&#x000FC;nden</td>
<td valign="top" align="left">Reason planned research ideas.</td>
<td valign="top" align="center">0.72</td>
<td/>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER14theory</td>
<td valign="top" align="left">Auf Basis der gelesenen Literatur Bereiche identifizieren, die (weiterer) Forschung bed&#x000FC;rfen</td>
<td valign="top" align="left">Identify areas of needed research, based on reading the literature.</td>
<td valign="top" align="center">0.70</td>
<td/>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER4theory</td>
<td valign="top" align="left">Den Verlauf eines Forschungsprojektes dokumentieren</td>
<td valign="top" align="left">Keep records during my research project.</td>
<td valign="top" align="center">0.66</td>
<td/>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER9theory</td>
<td valign="top" align="left">Eine einordnende Begutachtung (&#x0201C;review&#x0201D;) der aktuellen Literatur eines interessanten Forschungs-bereichs schreiben</td>
<td valign="top" align="left">Write a literature review in an area of research interest.</td>
<td valign="top" align="center">0.63</td>
<td/>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER1theory</td>
<td valign="top" align="left">Fragen entwickeln, die sich zur Erforschung eignen</td>
<td valign="top" align="left">Generate researchable questions.</td>
<td valign="top" align="center">0.63</td>
<td/>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER3emp</td>
<td valign="top" align="left">Angemessene Auswertungs-methoden ausw&#x000E4;hlen</td>
<td valign="top" align="left">Choose appropriate data analysis techniques.</td>
<td/>
<td valign="top" align="center">0.84</td>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER10emp</td>
<td valign="top" align="left">Wissen, welche Auswertungs-methode zu benutzen ist</td>
<td valign="top" align="left">Know which data analysis method to use.</td>
<td/>
<td valign="top" align="center">0.80</td>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER19emp</td>
<td valign="top" align="left">Alle wichtigen Details der Daten-erhebung beachten</td>
<td valign="top" align="left">Attend to all relevant details of data collection.</td>
<td/>
<td valign="top" align="center">0.76</td>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER15emp</td>
<td valign="top" align="left">Die Zuverl&#x000E4;ssigkeit der Daten &#x000FC;ber verschiedene Erhebungen, Rater&#x0002A;innen und/oder Instrumente hinweg gew&#x000E4;hrleisten</td>
<td valign="top" align="left">Ensure data collection is reliable across trial, raters, and equipment.</td>
<td/>
<td valign="top" align="center">0.75</td>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER8emp</td>
<td valign="top" align="left">Daten erheben</td>
<td valign="top" align="left">Collect data.</td>
<td/>
<td valign="top" align="center">0.68</td>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER5emp</td>
<td valign="top" align="left">Ein &#x000FC;bliches Computerprogramm zur Datenauswertung nutzen (z.B. MAXQDA/SPSS/R)</td>
<td valign="top" align="left">Use an existing computer package to analyze data (e.g., MaxQDA, SPSS, and R).</td>
<td/>
<td valign="top" align="center">0.56</td>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER2emp</td>
<td valign="top" align="left">Zur Beantwortung meiner Forschungsfrage geeignete Proband&#x0002A;innen/Teilnehmer&#x0002A;innen gewinnen.</td>
<td valign="top" align="left">Obtain appropriate subjects for my study.</td>
<td/>
<td valign="top" align="center">0.38</td>
<td/>
</tr> <tr>
<td valign="top" align="left">ASER16emp</td>
<td valign="top" align="left">Die Ergebnisse meiner Datenauswertung interpretieren</td>
<td valign="top" align="left">Interpret results of my analyses.</td>
<td/>
<td/>
<td valign="top" align="center">0.89</td>
</tr> <tr>
<td valign="top" align="left">ASER13emp</td>
<td valign="top" align="left">Ergebnisse der Datenauswertung verstehen</td>
<td valign="top" align="left">Understand data analysis results.</td>
<td/>
<td/>
<td valign="top" align="center">0.84</td>
</tr> <tr>
<td valign="top" align="left">ASER7emp</td>
<td valign="top" align="left">Meine Forschungsergebnisse verstehen und interpretieren</td>
<td valign="top" align="left">Interpret and understand the results of my research.</td>
<td/>
<td/>
<td valign="top" align="center">0.83</td>
</tr> <tr style="background-color:#e0e1e3">
<td valign="top" align="left" colspan="3"><bold>Latent factor correlations</bold></td>
<td valign="top" align="left" colspan="2">&#x003C1;</td>
<td/>
</tr> <tr>
<td valign="top" align="left">II</td>
<td/>
<td/>
<td valign="top" align="center">0.75</td>
<td/>
<td/>
</tr> <tr>
<td valign="top" align="left">III</td>
<td/>
<td/>
<td valign="top" align="center">0.76</td>
<td valign="top" align="center">0.83</td>
<td/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>I, Theoretical Aspects factor; II, Empirical Aspects factor; III, Interpretation factor. All <italic>p</italic> &#x0003C; 0.001.</p>
</table-wrap-foot>
</table-wrap>
<p>General self-efficacy was assessed by three items rated on an 11-point Likert scale (e.g., &#x0201C;In difficult situations, I can rely on my abilities;&#x0201D; Beierlein et al., <xref ref-type="bibr" rid="B4">2012</xref>). The authors report reliability indicators of <italic>r</italic><sub><italic>tt</italic></sub> = 0.50 as well as satisfactory construct and factorial validity. In this sample, the parallel analysis suggested a one-factor model with McDonalds &#x003C9;<sub><italic>t</italic></sub> = 0.74.</p>
<p>General academic self-concept was measured by a five-item scale, developed by Dickh&#x000E4;user et al. (<xref ref-type="bibr" rid="B12">2002</xref>) with McDonald&#x00027;s &#x003C9;<sub><italic>t</italic></sub> = 0.83 in this study. Students rate their perceived abilities on a seven-point Likert scale from &#x0201C;very low&#x0201D; to &#x0201C;very high&#x0201D; (e.g., &#x0201C;I think my abilities in this program are...&#x0201D;).</p>
<p>Agreeableness and neuroticism were assessed on an 11-point Likert scale by four items each, taken from the short version of the Big Five Inventory (Rammstedt and John, <xref ref-type="bibr" rid="B49">2005</xref>) with Cronbach&#x02018;s &#x003B1; ranging from 0.74 to 0.77 for neuroticism (McDonald&#x00027;s &#x003C9;<sub><italic>h</italic></sub> = 0.79 in this sample) and from 0.59 to 0.64 for agreeableness (McDonald&#x00027;s &#x003C9;<sub><italic>h</italic></sub> = 0.68). Considering the scale shortage and indicators of the strong factorial and construct validity, psychometric properties are acceptable.</p>
<p>Research attitudes were measured by the Revised Attitudes Toward Research Scale (Papanastasiou, <xref ref-type="bibr" rid="B43">2014</xref>), a 13-item measure. In this sample, the parallel analysis suggested a model with four factors or two components and EFA revealed an interpretable two-factor solution with all positive and all negative attitudes items loading onto the two factors, respectively. Thus, this solution was used in this study with McDonalds &#x003C9;<sub><italic>t</italic></sub> = 0.89 for the two-factor model. Sample items are &#x0201C;Research is connected to my field of study&#x0201D; or &#x0201C;Research makes me nervous.&#x0201D;</p>
<p>Furthermore, prior training was estimated by total semesters in university. Demographics included age, gender, current program and institution, and information on degrees.</p>
</sec>
<sec>
<title>2.3. Data analysis</title>
<p>Measurement and construct validity covariance structure models were fitted using the lavaan (Rosseel, <xref ref-type="bibr" rid="B52">2012</xref>) and lavaan.survey (Oberski, <xref ref-type="bibr" rid="B41">2014</xref>) packages in R (R Core Team, <xref ref-type="bibr" rid="B48">2022</xref>).</p>
<p>Measurement invariance (MI) between categorical groups (gender, discipline, and program level) was analyzed by multi-group confirmatory factor analysis (MGCFA, J&#x000F6;reskog, <xref ref-type="bibr" rid="B25">1971</xref>). MI judgment was based on changes in fit statistics between the configural, metric, scalar, and strict models as suggested by Chen (<xref ref-type="bibr" rid="B7">2007</xref>) with cut-offs of &#x00394;CFI &#x0003C; 0.01, &#x00394;RMSEA &#x0003C; 0.015, and &#x00394;SRMR &#x0003C; 0.03 (metric MI) or &#x00394;SRMR &#x0003C; 0.015 (scalar and strict MI). RMSEA was considered cautiously because it tends to over-reject correct models in small samples (Chen et al., <xref ref-type="bibr" rid="B8">2008</xref>).</p>
<p>In the case of non-categorical groups, MGCFA has the disadvantage of losing information due to categorizing continuous &#x0201C;grouping&#x0201D; variables into circumscribed groups which are (a) potentially variant within themselves and (b) in many cases arbitrarily divided (Hildebrandt et al., <xref ref-type="bibr" rid="B19">2016</xref>). Local structural equation modeling (LSEM, Hildebrandt et al., <xref ref-type="bibr" rid="B20">2009</xref>) overcomes this issue by testing continuous moderator effects on model parameters, e.g., across the number of semesters. Furthermore, LSEM can identify the onset of potential differences without requiring researchers to specify a moderating function a priori (Hildebrandt et al., <xref ref-type="bibr" rid="B19">2016</xref>). Thus, in addition, (in)variance of model parameters across different levels of scientific training (operationalized by the amount semesters) was investigated with LSEM. LSEM analyses were computed in R using the sirt package (Robitzsch, <xref ref-type="bibr" rid="B50">2015</xref>), and the wrapper function lsem.estimate as well as the R function lsem.permutation (Hildebrandt et al., <xref ref-type="bibr" rid="B19">2016</xref>). For a comprehensible introduction to this relatively recent method, we refer to Hildebrandt et al. (<xref ref-type="bibr" rid="B20">2009</xref>, <xref ref-type="bibr" rid="B19">2016</xref>).</p>
</sec>
</sec>
<sec id="s3">
<title>3. Results</title>
<sec>
<title>3.1. ASER validity</title>
<sec>
<title>3.1.1. Construct validity</title>
<p>The two-factor structure reported by Klieme (<xref ref-type="bibr" rid="B31">2021</xref>) could not be confirmed in this study. Here, the ASER items were better represented by a non-hierarchical three-factor model [<inline-formula><mml:math id="M2"><mml:msubsup><mml:mrow><mml:mi>&#x003C7;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>149</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> = 711.746, <italic>p</italic> = 0.000, CFI = 0.911, RMSEA = 0.083, SRMR = 0.048] which can be inspected in <xref ref-type="table" rid="T1">Table 1</xref>. This overall only moderate fit might be explained by non-invariance between different sub-groups, which renders a very good model fit impossible when the whole sample is analyzed together. Thus, differentiating effects between certain groups are still worthwhile to be analyzed, even if the overall model fit with data comprising all these potentially heterogenous groups indicates some issues. Consequences hereof will be addressed in the following MI analyses and the discussion. The three RSE sub-factors refer to Theoretical Aspects of research (nine items, e.g., &#x0201C;Generate researchable questions&#x0201D; or &#x0201C;Write a discussion section for my thesis&#x0201D;), Technical aspects (seven items, e.g., &#x0201C;Collect data&#x0201D; or &#x0201C;Know which data analysis method to use&#x0201D;), and Interpretation (three items, e.g., &#x0201C;Interpret and understand the results of my research&#x0201D;). The 12 convergent validation items were best represented by individual factors corresponding to their origin of publication (Gess et al., <xref ref-type="bibr" rid="B17">2018</xref>; Pfeiffer et al., <xref ref-type="bibr" rid="B44">2018</xref>, respectively).</p>
<p>Construct validity was investigated by a comprehensive model, comprising a measurement and a structure model (see <xref ref-type="fig" rid="F1">Figure 1</xref>). RSE was modeled as a second-order factor with five sub-factors: three ASER sub-factors and one sub-factor for each set of validation items (Gess et al., <xref ref-type="bibr" rid="B17">2018</xref>; Pfeiffer et al., <xref ref-type="bibr" rid="B44">2018</xref>). In the structure model, discriminant constructs were modeled as one-factorial. One exception is attitudes toward research, which shows a hierarchical structure with a positive and negative attitudes sub-factor. Model fit was again moderate to poor [<inline-formula><mml:math id="M3"><mml:msubsup><mml:mrow><mml:mi>&#x003C7;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>429</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> = 4,539.186, <italic>p</italic> &#x0003C; 0.001, CFI = 0.838, RMSEA<sub>CI95%</sub> = (0.056; 0.058), SRMR = 0.059]. Still, the loadings of the five sub-factors on the RSE super factor were all similar in size (see <xref ref-type="fig" rid="F1">Figure 1</xref>). Thus, the ASER items represent RSE similarly to the validation items (H1a). Latent factor correlations between the ASER and each validation factor 1 and 2 were 0.92 (<italic>p</italic> &#x0003C; 0.001) and 0.86 (<italic>p</italic> &#x0003C; 0.001), respectively (for factor naming see <xref ref-type="fig" rid="F1">Figure 1</xref>).</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Conceptual construct validity model with respective parameter estimates.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="feduc-08-1092714-g0001.tif"/>
</fig>
<p>The relations in the nomological net hypothesized in H2 were confirmed, supporting the positioning of RSE in the nomological net. Validity coefficients are shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. Corresponding to H3, correlations supporting convergent validity were stronger than those supporting discriminant validity.</p>
</sec>
<sec>
<title>3.1.2. Gender bias</title>
<p>Measurement bias was tested through MI analyses using MGCFA. MI Interpretation was based on differences in fit statistics between the increasingly constrained models, with cut-offs of &#x00394;CFI &#x0003C; 0.01, &#x00394;RMSEA &#x0003C; 0.015, and &#x00394;SRMR &#x0003C; 0.015 as suggested by Chen (<xref ref-type="bibr" rid="B7">2007</xref>). In addition, chi-square difference tests were considered in case of unclear results. However, since they tend to be overly conservative (Putnick and Bornstein, <xref ref-type="bibr" rid="B47">2016</xref>), they were included with caution.</p>
<p>Subjects who identified as non-binary were excluded, because this group was too small (<italic>N</italic> = 23), as were seven subjects who did not indicate any gender. For the remaining 524 participants (183 male students and 341 female students), the ASER three-factor model displayed satisfactory fit in the MGCFA configural model [<inline-formula><mml:math id="M4"><mml:msubsup><mml:mrow><mml:mi>&#x003C7;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>149</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> = 699.938, <italic>p</italic> &#x0003C; 0.001, CFI = 0.908, RMSEA = 0.084, SRMR = 0.049]. Strict measurement invariance between female and male students was found, meaning that both groups displayed non-invariant factor loadings, item intercepts, and residual variances (for model statistics see <xref ref-type="supplementary-material" rid="SM1">Supplementary material B</xref>). Thus, the latent factor scores can be compared between genders (see <xref ref-type="table" rid="T2">Table 2</xref>). Differences in mean factor scores between female and male students were significant for the Technical Aspects factor only (<italic>p</italic> = 0.016) with manifest mean values of <italic>M</italic> = 5.25 (female students) vs. <italic>M</italic> = 5.60 (male students). The mean factor score differences were non-significant for the Theoretical Aspects factor and the Interpretation factor.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Latent mean comparisons on ASER sub-factors for gender, discipline, and program level.</p></caption>
<table frame="box" rules="all">
<thead><tr style="background-color:#919498;color:#ffffff">
<th valign="top" align="left"><bold>Sub scale</bold></th>
<th valign="top" align="center" colspan="2"><bold>Gender</bold><xref ref-type="table-fn" rid="TN1"><sup><bold>a</bold></sup></xref></th>
<th valign="top" align="center" colspan="2"><bold>Discipline</bold><xref ref-type="table-fn" rid="TN1"><sup><bold>a</bold></sup></xref></th>
<th valign="top" align="center" colspan="2"><bold>Program level</bold><xref ref-type="table-fn" rid="TN2"><sup><bold>b</bold></sup></xref></th>
</tr>
</thead>
<tbody>
<tr style="background-color:#919498;color:#ffffff">
<td/>
<td valign="top" align="center"><bold>Female&#x02014;male</bold></td>
<td valign="top" align="center"><italic><bold>p</bold></italic></td>
<td valign="top" align="center"><bold>Psych.&#x02014;Ed.Sc</bold>.</td>
<td valign="top" align="center"><italic><bold>p</bold></italic></td>
<td valign="top" align="center"><bold>Master&#x02014;bachelor</bold></td>
<td valign="top" align="center"><italic><bold>p</bold></italic></td>
</tr> <tr>
<td valign="top" align="left">Theoretical Aspects</td>
<td valign="top" align="center">&#x02212;0.13</td>
<td valign="top" align="center">0.478</td>
<td valign="top" align="center">0.35</td>
<td valign="top" align="center">0.028</td>
<td valign="top" align="center">X</td>
<td valign="top" align="center">x</td>
</tr> <tr>
<td valign="top" align="left">Technical Aspects</td>
<td valign="top" align="center">&#x02212;0.36</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">x</td>
<td valign="top" align="center">X</td>
<td valign="top" align="center">0.56</td>
<td valign="top" align="center">0.000</td>
</tr> <tr>
<td valign="top" align="left">Interpretation</td>
<td valign="top" align="center">&#x02212;0.30</td>
<td valign="top" align="center">0.115</td>
<td valign="top" align="center">0.00</td>
<td valign="top" align="center">0.975</td>
<td valign="top" align="center">0.65</td>
<td valign="top" align="center">0.000</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN1"><label>a</label><p>Comparison from the strict model.</p></fn>
<fn id="TN2"><label>b</label><p>Comparison from the scalar model, x = MI was too low for valid comparisons.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
<sec>
<title>3.2. Structural differentiation</title>
<p>As a central purpose of this study, structural differentiation was tested through MI analyses of training level and academic discipline.<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref> In the first step, MGCFAs were conducted for categorical group comparisons. The MGFCA configural model was the same for MGCFA analyses regarding both grouping variables and included all subjects (<italic>N</italic> = 554). It displayed satisfactory fit [<inline-formula><mml:math id="M5"><mml:msubsup><mml:mrow><mml:mi>&#x003C7;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>149</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> = 711.746, <italic>p</italic> &#x0003C; 0.001, CFI = 0.911, RMSEA = 0.083, SRMS = 0.049]. Again, judgments were based mainly on changes in fit statistics. <xref ref-type="fig" rid="F2">Figure 2</xref> displays the findings of MI analyses of main and interaction grouping. For clarity, results are condensed to show the main trends. Detailed fit statistics can be inspected in <xref ref-type="supplementary-material" rid="SM1">Supplementary material B</xref>. In a second step, LSEM on the training level was employed exploratorily to determine specific points of measurement (in)variance.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Levels of measurement invariance between discipline and program level for total and sub-groups.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="feduc-08-1092714-g0002.tif"/>
</fig>
<sec>
<title>3.2.1. Measurement invariance regarding discipline: Psychology vs. educational science</title>
<p>Overall strict MI was found between psychology and educational science students, that is non-invariant factor loadings, items intercepts, and residual variances. For the sub-factors, MI was scalar (equal factor means) for the Theoretical Aspects and strict for the Interpretation factor. Thus, means can be compared between both disciplines on these two sub-factors (see <xref ref-type="table" rid="T2">Table 2</xref>). For the Technical Aspects factors, configural MI indicated that the latent meaning of this factor is different for psychology and educational science students due to differences in factor loadings. Manifest scores should, thus, not be used for group comparison. The mean factor scores were significantly different for Theoretical Aspects (<italic>p</italic> = 0.017, manifest <italic>M</italic> = 6.13 and <italic>M</italic> = 5.79, respectively) but not for Interpretation (manifest <italic>M</italic> = 6.30 for both groups). Concluding, MI results between both disciplines varied for the three sub-factors.</p>
</sec>
<sec>
<title>3.2.2. Measurement invariance regarding program level: Bachelor vs. Master</title>
<sec>
<title>3.2.2.1. MGCFA</title>
<p>Overall MI between program levels was found on a scalar level, that is invariant factor loadings and latent means. For the sub-factors, MI was strict for the Technical Aspects and Interpretation factors and metric for the Theoretical Aspects factor. Thus, test scores can be validly compared between training levels on the Technical Aspects and the Interpretation factor, while the Theoretical Aspects factor differ in meaning in Bachelor and Master students. Differences in mean factor scores were significant for both the Technical Aspects and the Interpretation factor, with <italic>p</italic> &#x0003C; 0.001 for both (see <xref ref-type="table" rid="T2">Table 2</xref>). Manifest scale means for Bachelor and Master students were <italic>M</italic> = 5.19 and <italic>M</italic> = 5.72 on the Technical Aspects scale, and <italic>M</italic> = 6.11 and <italic>M</italic> = 6.71 on the Interpretation scale, respectively. Again, MI results between program levels were mixed.</p>
</sec>
<sec>
<title>3.2.2.2. Local structural equation modeling</title>
<p>For a more detailed analysis, LSEM was conducted with training level as a moderator of model parameters. This way, changes in factor Interpretations across the training level can be analyzed continuously, and potential areas or onsets of changes can be delineated. Here, the training level was operationalized as the number of semesters.</p>
<p>For LSEM analyses, only cases without missing values on the analyzed variables (ASER items and the number of semesters) were included, resulting in a sample size of <italic>N</italic> = 519. Focal points for the local model estimations were chosen for values of 1&#x02013;10 on the moderator, with an interval of one semester. This ensured a large enough effective sample size that entered each local analysis, ranging between <italic>N</italic><sub><italic>eff</italic></sub> = 63.36 (10 semesters) and <italic>N</italic><sub><italic>eff</italic></sub> = 151.50 (three semesters), with a mean of 123.55 (<italic>SD</italic> = 27.92). The number of permutations for testing the global and pointwise hypotheses was set to 1,000.</p>
<p>Fit indices for the locally estimated models were moderate to poor (see <xref ref-type="supplementary-material" rid="SM2">Supplementary material C</xref> for results of LSEM analyses). Overall model fit changed across training level, indicated by significant variation in SRMR (<italic>M</italic> = 0.065, <italic>SD</italic> = 0.052, <italic>p</italic> &#x0003C; 0.05). On &#x003B1; = 0.05 level, variations of loadings were overall significant for three items, and variations of residual variances were overall significant only for ASER6theory (&#x0201C;Write the introduction, theory, and discussion part of my thesis&#x0201D;). The three-item loadings that varied across semesters were ASER12theory (&#x0201C;write the discussion section for my thesis&#x0201D;), ASER1theory (&#x0201C;develop researchable questions&#x0201D;), and ASER2emp (&#x0201C;sample participants&#x0201D;). This means that the latent meaning of these items differs significantly across training levels and might explain why the model fit varies significantly.</p>
</sec>
</sec>
<sec>
<title>3.2.3. Interaction effects</title>
<p>Interaction effects occurred between discipline and training levels. When separating Bachelor and Master students, MI between disciplines changed with the training level: it decreased for the Theoretical Aspects factor but augmented for the Technical Aspects factor (see <xref ref-type="supplementary-material" rid="SM1">Supplementary material B5</xref>, <xref ref-type="supplementary-material" rid="SM1">B6</xref>). Viewed from the other angle, MI between Bachelor and Master was lower in the psychology than in the educational science sub-sample. This means that the differentiating effect of discipline on RSE latent structure is moderated by training level, or else, the differentiating effect of training level is stronger in psychology students.</p>
</sec>
</sec>
</sec>
<sec id="s4">
<title>4. Discussion</title>
<p>This study aimed to propose and empirically test a differentiation hypothesis that might explain some of the heterogeneity in results on the RSE structure and measurement. We hypothesized differentiating effects that become salient in the course of higher education, affecting the latent structure and thus the construct meaning of RSE. The effects we investigated were program level and academic discipline, as well as their interaction. If these differentiating student characteristics go unregarded in the RSE literature, scholars miss out on an important factor to be considered in RSE research regarding the integration of results from common research efforts.</p>
<sec>
<title>4.1. ASER validity</title>
<p>Since valid measurement of RSE is a prerequisite for our endeavor, we delineated the validity of the employed and recently developed ASER regarding construct validity and gender bias. RSE has been claimed by other scholars to be multi-faceted, and the existing American instruments all propose a multi-factor structure (Forester et al., <xref ref-type="bibr" rid="B15">2004</xref>). Regarding the factor structure, the ASER is still under development. In this sample, a hierarchical three-factor structure emerged as the best-fitting model. The three sub-factors refer to research self-efficacy regarding Theoretical Aspects, Technical Aspects, and Interpretation.</p>
<p>Assessment of Self-Efficacy in Research Questionnaire construct validity was supported for all our hypotheses: relations to convergent items and discriminant constructs met expected directions and sizes. Thus, the ASER is fit to measure RSE as hypothesized within the nomological net, and ASER scores can be used to explore important relations in academic higher education.</p>
<p>Strict MI indicated that latent scores can be validly compared between female and male students. Doing so, the ASER reveals significant differences for the Technical Aspects factor but not for the other two factors. These mixed findings support the mixed findings reported previously: it may be that gender differences occur on the sub-scale level, but not on the total score level. The non-significant gender differences reported in a meta-analysis by Livin&#x00163;i et al. (<xref ref-type="bibr" rid="B35">2021</xref>) are in accordance with overall MI and MI for the Theoretical Aspects and Interpretation factors. However, Huang (<xref ref-type="bibr" rid="B22">2013</xref>) reports gender differences specifically regarding mathematics and social-sciences self-efficacy, sub-scales that may relate to the Technical Aspects factor of RSE that showed non-invariance in this study.</p>
</sec>
<sec>
<title>4.2. Structural differentiation</title>
<p>Previous empirical approaches to RSE measurement yield heterogeneous results regarding factor structure. We hypothesized that these differences can partly be explained by sample differences that engender structural differentiation, and that have gone unregarded as of yet. We coined this hypothesis the differentiation hypothesis, and it attempted to reconcile heterogeneous findings on the RSE structure. Hence, a systematization in (in-)variance of the ASER three-factor structure was explored. Findings of strict MI between female and male students indicate that the ASER is generally fit for valid invariant measurement. Following, a lack of MI between certain groups can be attributed to real group differences and does probably not stem from weakness in measurement.</p>
<p>Measurement non-invariance was expected and tested between different academic disciplines and between different training levels through MGCFA. The overall moderate model fit can be explained by non-invariance between different sub-groups, which renders a very good model fit including the whole sample impossible. The results from the MI analyses are, thus, still valuable.</p>
<sec>
<title>4.2.1. Discipline</title>
<p>The results of MI analyses between psychology and educational science students were mixed. At least scalar MI was found for the Theoretical Aspects and the Interpretation factor, but not for the Technical Aspects factor. Thus, RSE appears to constitute slight differences in meaning across disciplines, which was especially apparent in research tasks like processes of data collection or knowledge of appropriate analysis methods.</p>
<p>Concluding, while RSE should be of importance in almost every academic program, the specific research and/or training cultures apparently differ between academic disciplines. The difference in mean scores on the Theoretical Aspects factor also underscores the importance of considering disciplines when researching RSE. It seems that differences do, indeed, occur both on structural and mean score levels, as well as regarding a variety of research tasks.</p>
<p>It would be interesting to investigate the reasons for these disciplinary differences in the perception of RSE. Do the main research practices differ across disciplines, and are they communicated differently to the students? Is the nature or the role of research perceived differently by students? Are research methods regarded and taught differently? In addition to psychology and educational science that were investigated in this study, taking into account further disciplines even beyond the social sciences might be insightful with respect to the investigation of disciplinary differentiation effects on RSE development.</p>
</sec>
<sec>
<title>4.2.2. Training level</title>
<p>Measurement invariance analyses between training levels, again, yielded mixed results. MI was scalar for the Technical Aspects and Interpretation factors, and metric for the Theoretical Aspects factor. Local structural equation modeling provided a more detailed insight due to the continuous moderation of model parameters: Significant variations in fit statistics, item loadings, and item residuals across semesters were revealed. These findings indicate a change in the meaning of the latent construct of &#x0201C;research self-efficacy&#x0201D; (Molenaar et al., <xref ref-type="bibr" rid="B40">2010</xref>). Most importantly, differences in model fit may indicate that not even configural MI (relating to the basic construct structure) can be validly assumed across all training levels: not only the meaning of each latent factor but also the amount and segmentation of RSE factors might differentiate with time at university.</p>
<p>Concluding, RSE changes with the training level on two levels: First, change in mean values can be reasoned to stem from increasing mastery/failure experience (Bandura, <xref ref-type="bibr" rid="B2">2006</xref>). Second, factor differentiation can be attributed to conceptual change in how research is regarded (Rochnia and Radisch, <xref ref-type="bibr" rid="B51">2021</xref>).</p>
<p>For an understanding of RSE development covering the whole qualification period of junior researchers, future studies might extend the investigation of the differentiation hypothesis to the doctoral level. However, different from other countries, doctoral qualification in Germany is not standardized: while there are a few distinguished doctoral training programs, most PhD candidates complete their PhD research within the context of common employment as assistant researchers at universities without ongoing formal methods education. Thus, Bachelor&#x00027;s and Master&#x00027;s programs are more similar to each other regarding method training, since this is where coursework and institutionalized learning takes place. The ASER was developed to assess RSE in Bachelor and Master students as RSE measurement for this level was missing. RSE at the doctoral level can already be validly assessed by a questionnaire by Lachmann et al. (<xref ref-type="bibr" rid="B33">2018</xref>). This measure comprises research tasks that are relevant on the PhD level, but not yet for Bachelor and Master students (e.g., &#x0201C;I can build cooperations with central researchers in my field&#x0201D;). Delineating whether and how these two measures can be employed to capture RSE development over the whole qualification span would be an interesting endeavor for future RSE research.</p>
</sec>
<sec>
<title>4.2.3. Interaction</title>
<p>Four independent MI analyses were conducted to investigate dichotomous interaction effects of training level and discipline on RSE structure. Results indicate that there is an interaction effect: Invariance seems to change with program level, especially so for psychology students. The data support the notion that psychology students at the master level perceive research differently than the other sub-groups. One reason might be differences in emphases on methods education between the disciplines. Presumably, methods education is more emphasized in psychology than in educational science. Even for programs such as school or clinical psychology, where most students plan a career in applied settings, American graduate education aims to also strengthen students&#x00027; scientific competencies by explicitly implementing the so-called &#x0201C;scientist practitioner approach&#x0201D; (Jones and Mehr, <xref ref-type="bibr" rid="B24">2007</xref>). Klieme et al. (<xref ref-type="bibr" rid="B32">2020</xref>) called for a similar focus in educational science programs through research-based learning.</p>
<p>Especially based on the fact that measurement variance increases with the program level, it can be reasoned that these differences are less due to primal interests that influence the choice of program, but due to socialization processes once a program is studied. Future research might analyze both differences in research practices (methodology) as well as the research training environment.</p>
</sec>
</sec>
<sec>
<title>4.3. Limitations of this study</title>
<p>Our results indicate that differentiating effects on RSE are worth considering both in theory development and in higher education practice. However, implications drawn from them are limited since the current study is based on cross-sectional data. Deducing a process theory of intra-individual RSE development is, thus, beyond the scope of this study. Furthermore, evidence-based recommendations for higher education practice are to be considered with caution. Longitudinal RSE development might differ from the effects found in the cross-sectional data. In addition, predictors of RSE level and RSE differentiation are important factors to be considered in educational practice. These predictors potentially comprise various environmental factors as well as person factors. In this regard, longitudinal data as called for some time ago (Kahn, <xref ref-type="bibr" rid="B26">2000</xref>) and again recently (Livin&#x00163;i et al., <xref ref-type="bibr" rid="B35">2021</xref>) will be necessary to specify which effects are actually salient in educational practice. Thus, this cross-sectional study is only the first step toward a systematic investigation of RSE development as well as toward evidence-based practices in methods education.</p>
<p>In addition, the results stem from a German sample. The cultural generalizability of our results needs to be delineated. Potential differences in RSE development and structure might be attributed to 2-fold cultural effects: differences between national culture in general, and differences in educational culture. Investigating relevant factors in these regards will be an interesting direction of international RSE research.</p>
<p>Furthermore, the three-factor structure of the ASER that fits the data of this study best needs to be confirmed in a German sample and internationally. Thus, the final structure, or even more specifically, the precise structural differentiation of RSE across relevant moderators, remains unresolved. This refers to the chicken-and-egg problem in empirical research on RSE structure: are we to begin with a specified theoretical structure? Or with no structure, which then can be freely developed empirically based on a content-valid item pool? How do we, then, judge findings on model fit and measurement non-invariance by means of traditional fit indices? Again, considering a lack of MI as a feature calls for elaborate methods that are fit for systematic investigations and judgments of said feature.</p>
</sec>
<sec>
<title>4.4. Implications and future research</title>
<p>The present study emphasized measurement non-invariance in educational variables to be understood as a feature rather than a bug. The structural differentiation that we found in this study spotlights that RSE is complex and malleable. This Interpretation supports a desired feature of academic education: conceptual change in research within university training (Rochnia and Radisch, <xref ref-type="bibr" rid="B51">2021</xref>) and, following, self-efficacy beliefs referring to research. Overall, we see four main areas where implications from our study become important.</p>
<sec>
<title>4.4.1. Methodical considerations</title>
<p>The methodical implications of our endeavor to take a non-traditional perspective on measurement (non-)invariance are the basis for future research on any differentiation of latent constructs. If test developers seriously begin to consider non-invariance in measurement as a feature, strategies are needed to deal with naturally resulting poor model fit in the configural baseline model, and with other procedures that have been employed to judge latent construct measurement under a perspective of desired invariance. Common procedures and cut-off values employed in current MI analyses do the concept of MI as a feature injustice and need to be differentiated as well. In this regard, Rochnia and Radisch (<xref ref-type="bibr" rid="B51">2021</xref>) revive the AGB Typology by Golembiewski et al. (<xref ref-type="bibr" rid="B18">1976</xref>) that systematizes levels of change in measurement across time. By this means, Rochnia and Radisch (<xref ref-type="bibr" rid="B51">2021</xref>) refrain from considering a lack of invariance generally as a bug but emphasize that considering different types of change in measurement may hold relevant information in educational settings. Regarding concrete analysis methods for MI, LSEM (Hildebrandt et al., <xref ref-type="bibr" rid="B20">2009</xref>), and moderated non-linear factor analysis (Bauer, <xref ref-type="bibr" rid="B3">2017</xref>) allow the ability to test changes in the model parameter as effects of certain moderating variables, like semesters at university. However, while both approaches are well fit to investigate changes in meanings and emphases of latent constructs, they are not (yet) able to detect systemic changes in the factor structure between groups. Meaning, both approaches are fit to investigate parameter changes in the same model but are not fit to investigate changes in the model structure (e.g., from a two to a three-factor model). Furthermore, a developmental differentiation of RSE may constitute itself in a growing, more complex factor structure contingent on relevant moderating variables. This will render the test of invariant covariance matrices as suggested by Vandenberg and Lance (<xref ref-type="bibr" rid="B56">2000</xref>) relevant again&#x02014;a step in MI analysis that has been omitted in most MI practice recently (Putnick and Bornstein, <xref ref-type="bibr" rid="B47">2016</xref>). In addition, exploratory methods for structure analyses might be needed that can depict changes in the construct structure across a certain moderator.</p>
</sec>
<sec>
<title>4.4.2. Reconciliation of heterogeneous results</title>
<p>The identified variance in construct structure and constitution may explain previously reported heterogeneous results. Factorial differences between different measures of RSE have been pointed out repeatedly (e.g., Forester et al., <xref ref-type="bibr" rid="B15">2004</xref>; Livin&#x00163;i et al., <xref ref-type="bibr" rid="B35">2021</xref>). However, an explanation for these differences is lacking as of yet, but might be achieved by considering our differentiation hypothesis and examining the respective databases used for instrument development and factor analysis. The empirical structure of the Self-efficacy in Research Measure (Phillips and Russell, <xref ref-type="bibr" rid="B45">1994</xref>) was based on the data from 219 doctoral students in counseling psychology. In contrast, the structure of the Research Self-Efficacy Scale (RSES, Bieschke et al., <xref ref-type="bibr" rid="B6">1996</xref>) resulted from a factor analysis with the data from 177 doctoral students enrolled in various programs, namely biological sciences (32%), social sciences (28%), humanities (23%), and physical sciences (17%). Since our results suggest that students&#x00027; academic discipline affects the RSE factor structure, the heterogeneous results from these two studies might be reconciled by considering disciplinary differentiation effects. This could be achieved by re-analyzing the factor structure of the RSES separated by discipline. A more feasible approach might be to systematically recognize and investigate the effects of the disciplines that participants are sampled from in future studies.</p>
<p>Furthermore, inconsistencies in the relations of RSE to other constructs have been pointed out regarding gender (Livin&#x00163;i et al., <xref ref-type="bibr" rid="B35">2021</xref>), research training environment, year in the program, interest in research, and research outcome expectations (Bieschke, <xref ref-type="bibr" rid="B5">2006</xref>). Again, suchlike heterogeneous results may be reconciled by taking a closer look at sample characteristics that affect construct structure and meaning. Specifically, if the meaning of a latent construct, namely RSE, is inconsistent across samples, empirical relations to neighboring constructs may be inconsistent as well. In their recent meta-analysis, Livin&#x00163;i et al. (<xref ref-type="bibr" rid="B35">2021</xref>) point out that student samples in RSE research are pretty heterogeneous, as participants are sampled from different training levels (undergraduate vs. graduate training) and from various disciplines such as counseling psychology, education, STEM education, and even law. Following, inconsistencies in construct relations may be explained under the differentiation hypothesis, since we found differentiation effects on the RSE structure for both of these dimensions, training level and discipline. The exemplary analysis of two studies on the relation between RSE and training environment does, indeed, reveal differences in sample characteristics. A non-significant relation reported by Phillips et al. (<xref ref-type="bibr" rid="B46">2004</xref>) is based on a sample of 84 individuals who had already accomplished their full doctoral degree (85%) or all requirements but the thesis. In contrast, a significant relation reported by Kahn (<xref ref-type="bibr" rid="B27">2001</xref>) is based on a sample of 149 doctoral students of which 50% were still in their first 2 years of training.</p>
<p>Concluding, in the context of theory development, the lack of MI in RSE measurement should caution us to compare findings on RSE structure, mean scores, or relations to other variables that stem from the data on students with different backgrounds. Similarly, practical implications drawn from empirical RSE research should be given very cautiously with regard to whether the targeted group that implications are drawn for is well-enough represented by the empirical sample. Therefore, future research should give more thought to considering potential sub-groups that have gone unregarded as of yet in order to reveal the differentiated picture of RSE that is needed. To do so, differentiated analyses are called for that are fit to model a lack of MI as a feature.</p>
</sec>
<sec>
<title>4.4.3. Research self-efficacy development and differentiation</title>
<p>Considering MI between training levels will foster a process-oriented theory of RSE structure and development. The differentiation hypothesis that we propose in this article might be one unifying factor in RSE theory development. Concrete characteristics that affect RSE mean scores as well as structural differentiation can be deduced from previous research (e.g., Livin&#x00163;i et al., <xref ref-type="bibr" rid="B35">2021</xref>) and social cognitive career theory (Lent et al., <xref ref-type="bibr" rid="B34">1994</xref>). They constitute person factors like interest and environmental factors like aspects of research training as antecedents to RSE development. Livin&#x00163;i et al. (<xref ref-type="bibr" rid="B35">2021</xref>) identified these antecedents of RSE in the context of social cognitive career theory. The studies included in their meta-analysis, however, exhibit two weak spots: first, they stem from cross-sectional data. Second, they analyze RSE total scores only, without taking into account structural differentiation.</p>
<p>Our results demonstrate the importance of structural differentiation, not only based on main effects but also on complex interaction, for example between the discipline studied and time in the program. Future research should address such potential interaction effects of discipline and the amount of methods training on factor model parameters in more detail. In a way, this interaction describes the culture and aspiration under which research is taught and addressed within a university program. Are students introduced to research methods and research matters in their discipline? Are they given the opportunity for their own (mastery) experience? Do faculty model scientific behavior and attitudes, and do they, thereby, socialize students into a (discipline) specific way to conceptualize and approach research? The theory of the research training environment (Gelso, <xref ref-type="bibr" rid="B16">1993</xref>) stems from an investigation of the science practitioner teaching in U.S. American psychology programs and incorporates those questions.</p>
<p>Indeed, previous researchers have identified these factors of the research training environment as a predictor of RSE in cross-sectional studies (e.g., Kahn and Schlosser, <xref ref-type="bibr" rid="B28">2010</xref>). One study explored the longitudinal effect of the research training environment on changes in RSE after 1 year in graduate training: Kahn (<xref ref-type="bibr" rid="B26">2000</xref>) found effects for one training aspect, namely the student&#x02013;mentor relation. In addition to Kahn, also Livin&#x00163;i et al. (<xref ref-type="bibr" rid="B35">2021</xref>) conclude from their meta-analysis the need for further longitudinal studies. Analyzing students&#x00027; research training environment might serve as an explanation for the interaction effect of discipline and training level on RSE differentiation. Longitudinal studies can in addition give insight into the intra-individual development of RSE, as well as delineate additional causal effects of possible predictors beyond the training environment (such as general self-efficacy, interest, or other person variables). Understanding causes, interactions, and onsets for this intra-individual change will help refine a theory of RSE development regarding both RSE scores and structural changes.</p>
</sec>
<sec>
<title>4.4.4. Higher education practice</title>
<p>A fine-grained investigation of structural changes can help clarify how students think about research and how this affects their self-efficacy at different levels. This might help to enhance and customize methods education to the &#x0201C;zone of proximal development.&#x0201D; Again, longitudinal studies may investigate these intra-personal structural changes in RSE as an effect of a specific person and environmental factors.</p>
<p>Developing an understanding of RSE in academic education can serve several goals in higher education practice. On an individual scale, considering students&#x00027; RSE can support our understanding of their individual needs regarding methods education. Specifically, insights from longitudinal studies will enable educators, faculty, and mentors to identify said &#x0201C;zones of proximal development&#x0201D; in regard to specific research tasks that students should be exposed to. This way, they can tailor methods education to students&#x00027; needs, focusing on tasks in which students perceive low self-efficacy. In addition, students might use the differentiating facets of RSE for the self-assessment of their own development (Forester et al., <xref ref-type="bibr" rid="B15">2004</xref>), especially in self-regulated learning settings. This can help them delineate what research tasks they should acquaint themselves with next. Hereby, focusing on self-efficacy development, beyond mere research knowledge, will draw both students&#x00027; and educators&#x00027; attention toward learning settings that deliberately incorporate its facilitation. Specifically, such settings should emphasize the main factors that influence self-efficacy beliefs. This is to facilitate mastery experiences through student activity and be aware of the communicated perspective on the value and epistemological nature of research. Our results show that the latent structure of RSE differs between students based on their disciplinary socialization. Thus, zones of proximal development are not only contingent on a student&#x00027;s training level, but also on other context factors that need to be considered in individual mentoring. Our results indicate that academic discipline is such a context factor, but systematic investigations of the longitudinal data on individual RSE development are needed to specify and confirm differentiating factors that operate on an individual level.</p>
<p>More generally, including RSE in higher education research may enhance our understanding of the psychological processes that underly successful academic development. Longitudinal studies should, thus, delineate specific factors in the training environment that prove promotive of RSE. Prospectively, these factors will enable faculty to foster students&#x00027; RSE, following evidence-based reasoning. However, considering the differentiation hypothesis, RSE sensitive practices in methods training can never be simply carried over from one setting to another. Following that, university didactics need to develop approaches to fostering RSE that are customized to different disciplines and training levels. These customized approaches can only be delineated through longitudinal studies that systematically investigate possible differentiation effects that operate on the group level.</p>
<p>Once the role of RSE in higher educational processes is delineated, its assessment might be an asset to the evaluation of research training settings on a broader scale, such as settings that employ research-based learning in Europe (Wessels et al., <xref ref-type="bibr" rid="B57">2021</xref>), or the scientist&#x02013;practitioner approach in the U.S. (for a description see Jones and Mehr, <xref ref-type="bibr" rid="B24">2007</xref>). Their focus on student activity (Huber, <xref ref-type="bibr" rid="B23">2014</xref>) can be reasoned to particularly enable mastery experience. Therefore, research self-efficacy should be regarded as a core outcome of such student-active learning environments and should, thus, be considered in program evaluation.</p>
<p>Taken together, the findings reported here can add to the conclusion that further investigation of RSE differentiation is needed, worthwhile, and beneficial. Understanding causes and onsets for change in the RSE structure will help refine both, a theory of RSE development and university methods training.</p>
</sec>
</sec>
</sec>
<sec sec-type="data-availability" id="s5">
<title>Data availability statement</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec sec-type="ethics-statement" id="s6">
<title>Ethics statement</title>
<p>Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. The patients/participants provided their written informed consent to participate in this study.</p>
</sec>
<sec sec-type="author-contributions" id="s7">
<title>Author contributions</title>
<p>KK: conceptualization (lead), investigation, methodology, data curation, formal analysis, writing&#x02014;original draft preparation, and writing&#x02014;review and editing (lead). FS-B: conceptualization (supporting), resources, and writing&#x02014;review and editing (supporting). All authors contributed to the article and approved the submitted version.</p>
</sec>
</body>
<back>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s8">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="supplementary-material" id="s9">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/feduc.2023.1092714/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/feduc.2023.1092714/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Table_1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table_2.pdf" id="SM2" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Presentation_1.pdf" id="SM3" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>See <xref ref-type="supplementary-material" rid="SM1">Supplementary material A</xref> for analyses of measurement invariance between U.S. American and German students.</p></fn>
<fn id="fn0002"><p><sup>2</sup>The jangle fallacy describes &#x0201C;the use of two separate words or expressions covering in fact the same basic situation, but sounding different, as though they were in truth different&#x0201D; (Kelley, <xref ref-type="bibr" rid="B30">1927</xref>, p. 64).</p></fn>
<fn id="fn0003"><p><sup>3</sup>Here, only the main findings are reported to enhance readability. All details on fit statistics and difference tests for the nested MGCFA models are displayed in the <xref ref-type="supplementary-material" rid="SM1">Supplementary material B</xref>.</p></fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bandura</surname> <given-names>A.</given-names></name></person-group> (<year>1997</year>). <source>Self-efficacy: The Exercise of Control</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Freeman</publisher-name>.</citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bandura</surname> <given-names>A.</given-names></name></person-group> (<year>2006</year>). <article-title>&#x0201C;Guide for constructing self-efficacy scales,&#x0201D;</article-title> in <source>Self-efficacy Beliefs of Adolescents</source>, eds F. Pajares and T. Urdan (<publisher-loc>Greenwich, CT</publisher-loc>: <publisher-name>Information Age Publishing</publisher-name>), <fpage>307</fpage>&#x02013;<lpage>337</lpage>.</citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bauer</surname> <given-names>D. J.</given-names></name></person-group> (<year>2017</year>). <article-title>A more general model for testing measurement invariance and differential item functioning</article-title>. <source>Psychol. Methods</source> <volume>22</volume>, <fpage>507</fpage>&#x02013;<lpage>526</lpage>. <pub-id pub-id-type="doi">10.1037/met0000077</pub-id><pub-id pub-id-type="pmid">27266798</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beierlein</surname> <given-names>C.</given-names></name> <name><surname>Kovaleva</surname> <given-names>A.</given-names></name> <name><surname>Kemper</surname> <given-names>C.</given-names></name> <name><surname>Rammstedt</surname> <given-names>B.</given-names></name></person-group> (<year>2012</year>). <source>Ein Messinstrument zur Erfassung subjektiver Kompetenzerwartungen Allgemeine Selbstwirksamkeit Kurzskala (ASKU). GESIS-Working Papers 17</source>. (K&#x000F6;ln: <publisher-name>GESIS - Leibniz-Institut f&#x000FC;r Sozialwissenschaften</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>24</lpage>.</citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bieschke</surname> <given-names>K. J.</given-names></name></person-group> (<year>2006</year>). <article-title>Research self-efficacy beliefs and research outcome expectations: Implications for developing scientifically minded psychologists</article-title>. <source>J. Career Assess.</source> <volume>14</volume>, <fpage>77</fpage>&#x02013;<lpage>91</lpage>. <pub-id pub-id-type="doi">10.1177/1069072705281366</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bieschke</surname> <given-names>K. J.</given-names></name> <name><surname>Bishop</surname> <given-names>R. M.</given-names></name> <name><surname>Garcia</surname> <given-names>V. L.</given-names></name></person-group> (<year>1996</year>). <article-title>The utility of the research self-efficacy scale</article-title>. <source>J. Career Assess.</source> <volume>4</volume>, <fpage>59</fpage>&#x02013;<lpage>75</lpage>. <pub-id pub-id-type="doi">10.1177/106907279600400104</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>F.</given-names></name></person-group> (<year>2007</year>). <article-title>Sensitivity of goodness of fit indexes to lack of measurement invariance</article-title>. <source>Struct. Eq. Model.</source> <volume>14</volume>, <fpage>464</fpage>&#x02013;<lpage>504</lpage>. <pub-id pub-id-type="doi">10.1080/10705510701301834</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>F.</given-names></name> <name><surname>Curran</surname> <given-names>P. J.</given-names></name> <name><surname>Bollen</surname> <given-names>K. A.</given-names></name> <name><surname>Kirby</surname> <given-names>J.</given-names></name> <name><surname>Paxton</surname> <given-names>P.</given-names></name></person-group> (<year>2008</year>). <article-title>An empirical evaluation of the use of fixed cutoff points in RMSEA test statistic in structural equation models</article-title>. <source>Sociol. Methods Res.</source> <volume>36</volume>, <fpage>462</fpage>&#x02013;<lpage>494</lpage>. <pub-id pub-id-type="doi">10.1177/0049124108314720</pub-id><pub-id pub-id-type="pmid">19756246</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Choi</surname> <given-names>M.</given-names></name></person-group> (<year>2005</year>). <article-title>Self-efficacy and self-concept as predictors of college students&#x00027; academic performance</article-title>. <source>Psychol. Schools</source> <volume>42</volume>, <fpage>197</fpage>&#x02013;<lpage>205</lpage>. <pub-id pub-id-type="doi">10.1002/pits.20048</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clark</surname> <given-names>L. A.</given-names></name> <name><surname>Watson</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Constructing validity: New developments in creating objective measuring instruments</article-title>. <source>Psychol. Assess.</source> <volume>31</volume>, <fpage>1412</fpage>&#x02013;<lpage>1427</lpage>. <pub-id pub-id-type="doi">10.1037/pas0000626</pub-id><pub-id pub-id-type="pmid">30896212</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Feyter</surname> <given-names>T.</given-names></name> <name><surname>Caers</surname> <given-names>R.</given-names></name> <name><surname>Vigna</surname> <given-names>C.</given-names></name> <name><surname>Berings</surname> <given-names>D.</given-names></name></person-group> (<year>2012</year>). <article-title>Unraveling the impact of the Big Five personality traits on academic performance: The moderating and mediating effects of self-efficacy and academic motivation</article-title>. <source>Learn. Individ. Diff.</source> <volume>22</volume>, <fpage>439</fpage>&#x02013;<lpage>448</lpage>. <pub-id pub-id-type="doi">10.1016/j.lindif.2012.03.013</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dickh&#x000E4;user</surname> <given-names>O.</given-names></name> <name><surname>Sch&#x000F6;ne</surname> <given-names>C.</given-names></name> <name><surname>Spinath</surname> <given-names>B.</given-names></name> <name><surname>Stiensmeier-Pelster</surname> <given-names>J.</given-names></name></person-group> (<year>2002</year>). <article-title>Die Skalen zum akademischen Selbstkonzept Konstruktion und &#x000DC;berpr&#x000FC;fung eines neuen Instrumentes. [The Academic Self Concept Scales: Construction and Evaluation of a New Instrument]</article-title>. <source>Zeitschrift f&#x000FC;r Differentielle und Diagnostische Psychologie</source> <volume>23</volume>, <fpage>393</fpage>&#x02013;<lpage>405</lpage>. <pub-id pub-id-type="doi">10.1024//0170-1789.23.4.393</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Finney</surname> <given-names>S. J.</given-names></name> <name><surname>Schraw</surname> <given-names>G.</given-names></name></person-group> (<year>2003</year>). <article-title>Self-efficacy beliefs in college statistics courses</article-title>. <source>Contemp. Educ. Psychol.</source> <volume>28</volume>, <fpage>161</fpage>&#x02013;<lpage>186</lpage>. <pub-id pub-id-type="doi">10.1016/S0361-476X(02)00015-2</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fischer</surname> <given-names>R.</given-names></name> <name><surname>Karl</surname> <given-names>J. A.</given-names></name></person-group> (<year>2019</year>). <article-title>A primer to (cross-cultural) multi-group invariance testing possibilities in R</article-title>. <source>Front. Psychol.</source> <volume>10</volume>, <fpage>1507</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2019.01507</pub-id><pub-id pub-id-type="pmid">31379641</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Forester</surname> <given-names>M.</given-names></name> <name><surname>Kahn</surname> <given-names>J.</given-names></name> <name><surname>McInnis</surname> <given-names>M.</given-names></name></person-group> (<year>2004</year>). <article-title>Factor structures of three measures of research self-efficacy</article-title>. <source>J. Career Asess.</source> <volume>12</volume>, <fpage>3</fpage>&#x02013;<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1177/1069072703257719</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gelso</surname> <given-names>C. J.</given-names></name></person-group> (<year>1993</year>). <article-title>On the making of a scientist-practitioner: A theory of research training in professional psychology</article-title>. <source>Prof. Psychol.</source> <volume>24</volume>, <fpage>468</fpage>&#x02013;<lpage>476</lpage>. <pub-id pub-id-type="doi">10.1037/0735-7028.24.4.468</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gess</surname> <given-names>C.</given-names></name> <name><surname>Geiger</surname> <given-names>C.</given-names></name> <name><surname>Ziegler</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Social-scientific research competency: Validation of test score interpretations for evaluative purposes in higher education</article-title>. <source>Eur. J. Psychol. Assess</source>. 2018, a000451. <pub-id pub-id-type="doi">10.1027/1015-5759/a000451</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Golembiewski</surname> <given-names>R. T.</given-names></name> <name><surname>Billingsley</surname> <given-names>K.</given-names></name> <name><surname>Yeager</surname> <given-names>S.</given-names></name></person-group> (<year>1976</year>). <article-title>Measuring change and persistence in humanaffairs: Types of change generated by OD designs</article-title>. <source>J. Appl. Behav. Sci.</source> <volume>12</volume>, <fpage>133</fpage>&#x02013;<lpage>157</lpage>. <pub-id pub-id-type="doi">10.1177/002188637601200201</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hildebrandt</surname> <given-names>A.</given-names></name> <name><surname>L&#x000FC;dtke</surname> <given-names>O.</given-names></name> <name><surname>Robitzsch</surname> <given-names>A.</given-names></name> <name><surname>Sommer</surname> <given-names>C.</given-names></name> <name><surname>Wilhelm</surname> <given-names>O.</given-names></name></person-group> (<year>2016</year>). <article-title>Exploring factor model parameters across continuous variables with local structural equation models</article-title>. <source>Multivar Behav. Res.</source> <volume>51</volume>, <fpage>257</fpage>&#x02013;<lpage>258</lpage>. <pub-id pub-id-type="doi">10.1080/00273171.2016.1142856</pub-id><pub-id pub-id-type="pmid">27049892</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hildebrandt</surname> <given-names>A.</given-names></name> <name><surname>Wilhelm</surname> <given-names>O.</given-names></name> <name><surname>Robitzsch</surname> <given-names>A.</given-names></name></person-group> (<year>2009</year>). <article-title>Complementary and competing factor analytic approaches for the investigation of measurement invariance</article-title>. <source>Rev. Psychol.</source> <volume>16</volume>, <fpage>87</fpage>&#x02013;<lpage>102</lpage>.</citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holden</surname> <given-names>G.</given-names></name> <name><surname>Barker</surname> <given-names>K.</given-names></name> <name><surname>Meenaghan</surname> <given-names>T.</given-names></name> <name><surname>Rosenberg</surname> <given-names>G.</given-names></name></person-group> (<year>1999</year>). <article-title>Research self-efficacy</article-title>. <source>J. Soc. Work Educ.</source> <volume>35</volume>, <fpage>463</fpage>&#x02013;<lpage>476</lpage>. <pub-id pub-id-type="doi">10.1080/10437797.1999.10778982</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>C.</given-names></name></person-group> (<year>2013</year>). <article-title>Gender differences in academic self-efficacy: A meta-analysis</article-title>. <source>Eur. J. Psychol. Educ.</source> <volume>28</volume>, <fpage>1</fpage>&#x02013;<lpage>35</lpage>. <pub-id pub-id-type="doi">10.1007/s10212-011-0097-y</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huber</surname> <given-names>L.</given-names></name></person-group> (<year>2014</year>). <article-title>Forschungsbasiertes, Forschungsorientiertes, Forschendes Lernen: Alles dasselbe? Ein Pl&#x000E4;doyer f&#x000FC;r eine Verst&#x000E4;ndigung &#x000FC;ber Begriffe und Unterscheidungen im Feld forschungsnahen Lehrens und Lernens</article-title>. <source>Das Hochschulwesen</source> <volume>62</volume>, <fpage>32</fpage>&#x02013;<lpage>39</lpage>.</citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jones</surname> <given-names>J.</given-names></name> <name><surname>Mehr</surname> <given-names>S.</given-names></name></person-group> (<year>2007</year>). <article-title>Foundations and assumptions of the scientist-practitioner model</article-title>. <source>Am. Behav. Scientist</source> <volume>50</volume>, <fpage>766</fpage>&#x02013;<lpage>777</lpage>. <pub-id pub-id-type="doi">10.1177/0002764206296454</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>J&#x000F6;reskog</surname> <given-names>K. G.</given-names></name></person-group> (<year>1971</year>). <article-title>Simultaneous factor analysis in several populations</article-title>. <source>Psychometrika</source> <volume>36</volume>, <fpage>409</fpage>&#x02013;<lpage>426</lpage>. <pub-id pub-id-type="doi">10.1007/BF02291366</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kahn</surname> <given-names>J. H.</given-names></name></person-group> (<year>2000</year>). <article-title>&#x0201C;Research training environment changes: Impacts on research self- efficacy and interest,&#x0201D;</article-title> in <source>Research Training in Counseling Psychology: New Advances and Directions. Symposium conducted at the Annual Convention of the American Psychological Association</source>. Washington, DC.</citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kahn</surname> <given-names>J. H.</given-names></name></person-group> (<year>2001</year>). <article-title>Predicting the scholarly activity of counseling psychology students: A refinement and Extension</article-title>. <source>J. Counsel. Psychol.</source> <volume>48</volume>, <fpage>344</fpage>&#x02013;<lpage>354</lpage>. <pub-id pub-id-type="doi">10.1037/0022-0167.48.3.344</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kahn</surname> <given-names>J. H.</given-names></name> <name><surname>Schlosser</surname> <given-names>L. Z.</given-names></name></person-group> (<year>2010</year>). <article-title>The graduate research training environment in professional psychology: A multilevel investigation</article-title>. <source>Train. Educ. Prof. Psychol.</source> <volume>4</volume>, <fpage>183</fpage>&#x02013;<lpage>193</lpage>. <pub-id pub-id-type="doi">10.1037/a0018968</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kahn</surname> <given-names>J. H.</given-names></name> <name><surname>Scott</surname> <given-names>N. A.</given-names></name></person-group> (<year>1997</year>). <article-title>Predictors of research productivity and science-related career goals among counseling psychology doctoral students</article-title>. <source>Counsel. Psychologist</source> <volume>25</volume>, <fpage>38</fpage>&#x02013;<lpage>67</lpage>. <pub-id pub-id-type="doi">10.1177/0011000097251005</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kelley</surname> <given-names>T. H.</given-names></name></person-group> (<year>1927</year>). <source>Interpretation of Educational Measurements.</source> <publisher-loc>Yonkers-on-Hudson, NY</publisher-loc>: <publisher-name>World Book Company</publisher-name>.</citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Klieme</surname> <given-names>K. E.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Psychological Factors in Academic Education &#x02013; Development of the Self-Efficacy in Research Questionnaire,&#x0201D;</article-title> in <source>Hochschullehre im Spannungsfeld zwischen individueller und institutioneller Verantwortung. Tagungsband der 15. Jahrestagung der Gesellschaft f&#x000FC;r Hochschulforschung</source>, eds C. Bohndick, M. B&#x000FC;low-Schramm, D. Paul, and G. Reinmann (Wiesbaden: Springer VS), <fpage>309</fpage>&#x02013;<lpage>322</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-658-32272-4_23</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Klieme</surname> <given-names>K. E.</given-names></name> <name><surname>Lehmann</surname> <given-names>T.</given-names></name> <name><surname>Schmidt-Borcherding</surname> <given-names>F.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Fostering professionalism and scientificity through integration of disciplinary and research knowledge,&#x0201D;</article-title> in <source>International Perspectives on Knowledge Integration: Theory, Research, and Good Practice in Pre-service Teacher and Higher Education</source>, ed T. Lehman (<publisher-loc>Leiden, Boston, MA</publisher-loc>: <publisher-name>Brill; Sense Publishers</publisher-name>), <fpage>79</fpage>&#x02013;<lpage>107</lpage>. <pub-id pub-id-type="doi">10.1163/9789004429499_005</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lachmann</surname> <given-names>D.</given-names></name> <name><surname>Epstein</surname> <given-names>N.</given-names></name> <name><surname>Eberle</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>FoSWE &#x02013; Eine Kurzskala zur Erfassung forschungsbezogener Selbstwirksamkeitserwartung</article-title>. <source>Zeitschrift f&#x000FC;r P&#x000E4;dagogische Psychologie</source> <volume>32</volume>, <fpage>89</fpage>&#x02013;<lpage>100</lpage>. <pub-id pub-id-type="doi">10.1024/1010-0652/a000217</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lent</surname> <given-names>R. W.</given-names></name> <name><surname>Brown</surname> <given-names>S. D.</given-names></name> <name><surname>Hackett</surname> <given-names>G.</given-names></name></person-group> (<year>1994</year>). <article-title>Toward a unifying social cognitive theory of career and academic interest, choice, and performance</article-title>. <source>J. Voc. Behav.</source> <volume>45</volume>, <fpage>79</fpage>&#x02013;<lpage>122</lpage>. <pub-id pub-id-type="doi">10.1006/jvbe.1994.1027</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Livin&#x00163;i</surname> <given-names>R.</given-names></name> <name><surname>Gunnesch-Luca</surname> <given-names>G.</given-names></name> <name><surname>Iliescu</surname> <given-names>D.</given-names></name></person-group> (<year>2021</year>). <article-title>Research self-efficacy: A meta-analysis</article-title>. <source>Educ. Psychologist</source> <volume>56</volume>, <fpage>215</fpage>&#x02013;<lpage>242</lpage>. <pub-id pub-id-type="doi">10.1080/00461520.2021.1886103</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marsh</surname> <given-names>H.</given-names></name> <name><surname>Trautwein</surname> <given-names>U.</given-names></name> <name><surname>L&#x000FC;dtke</surname> <given-names>O.</given-names></name> <name><surname>K&#x000F6;ller</surname> <given-names>O.</given-names></name> <name><surname>Baumert</surname> <given-names>J.</given-names></name></person-group> (<year>2006</year>). <article-title>Integration of multidimensional self-concept and core personality constructs: Construct validation and relations to well-being and achievement</article-title>. <source>J. Personal.</source> <volume>74</volume>, <fpage>403</fpage>&#x02013;<lpage>456</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-6494.2005.00380.x</pub-id><pub-id pub-id-type="pmid">16529582</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mason</surname> <given-names>L.</given-names></name> <name><surname>Boscolo</surname> <given-names>P.</given-names></name> <name><surname>Tornatora</surname> <given-names>M. C.</given-names></name> <name><surname>Ronconi</surname> <given-names>L.</given-names></name></person-group> (<year>2013</year>). <article-title>Besides knowledge: a cross-sectional study on the relations between epistemic beliefs, achievement goals, self-beliefs, and achievement in science</article-title>. <source>Instruct. Sci.</source> <volume>41</volume>, <fpage>49</fpage>&#x02013;<lpage>79</lpage>. <pub-id pub-id-type="doi">10.1007/s11251-012-9210-0</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meredith</surname> <given-names>W.</given-names></name></person-group> (<year>1993</year>). <article-title>Measurement invariance, factor analysis and factorial invariance</article-title>. <source>Psychometrika</source> <volume>58</volume>, <fpage>525</fpage>&#x02013;<lpage>543</lpage>. <pub-id pub-id-type="doi">10.1007/BF02294825</pub-id></citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mieg</surname> <given-names>H. A.</given-names></name> <name><surname>Ambos</surname> <given-names>E.</given-names></name> <name><surname>Brew</surname> <given-names>A.</given-names></name> <name><surname>Lehmann</surname> <given-names>J.</given-names></name> <name><surname>Galli</surname> <given-names>D.</given-names></name></person-group> (<year>2022</year>). <source>The Cambridge Handbook of Undergraduate Research</source>. Cambridge: Cambridge University Press. <pub-id pub-id-type="doi">10.1017/9781108869508</pub-id><pub-id pub-id-type="pmid">31932384</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Molenaar</surname> <given-names>D.</given-names></name> <name><surname>Dolan</surname> <given-names>C. V.</given-names></name> <name><surname>Wicherts</surname> <given-names>J. M.</given-names></name> <name><surname>van der Maas</surname> <given-names>H. L. J.</given-names></name></person-group> (<year>2010</year>). <article-title>Modeling differentiation of cognitive abilities within the higher-order factor model using moderated factor analysis</article-title>. <source>Intelligence</source> <volume>38</volume>, <fpage>611</fpage>&#x02013;<lpage>624</lpage>. <pub-id pub-id-type="doi">10.1016/j.intell.2010.09.002</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Oberski</surname> <given-names>D.</given-names></name></person-group> (<year>2014</year>). <article-title>lavaan.survey: An R package for complex survey analysis of structural equation models</article-title>. <source>J. Statist. Softw.</source> <volume>57</volume>, <fpage>1</fpage>&#x02013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.18637/jss.v057.i01</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>O&#x00027;Brien</surname> <given-names>K. M.</given-names></name> <name><surname>Malone</surname> <given-names>M. E.</given-names></name> <name><surname>Schmidt</surname> <given-names>C. K.</given-names></name> <name><surname>Lucas</surname> <given-names>M. S.</given-names></name></person-group> (<year>1998</year>). <article-title>&#x0201C;Research selfefficacy: Improvements in instrumentation,&#x0201D;</article-title> in <source>Poster Session Presented at the Annual Conference of the American Psychological Association</source>. San Francisco, CA.</citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Papanastasiou</surname> <given-names>E.</given-names></name></person-group> (<year>2014</year>). <article-title>Revised-attitudes toward research scale (R-ATR). A first look at its psychometric properties</article-title>. <source>J. Res. Educ.</source> <volume>24</volume>, <fpage>146</fpage>&#x02013;<lpage>159</lpage>. <pub-id pub-id-type="doi">10.1037/t35506-000</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pfeiffer</surname> <given-names>H.</given-names></name> <name><surname>Preckel</surname> <given-names>F.</given-names></name> <name><surname>Ellwart</surname> <given-names>T.</given-names></name></person-group> (<year>2018</year>). <article-title>Selbstwirksamkeitserwartung von Studierenden. Facettentheoretische Validierung eines Messmodells am Beispiel der Psychologie</article-title>. <source>Diagnostica</source> <volume>64</volume>, <fpage>133</fpage>&#x02013;<lpage>144</lpage>. <pub-id pub-id-type="doi">10.1026/0012-1924/a000199</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Phillips</surname> <given-names>J. C.</given-names></name> <name><surname>Russell</surname> <given-names>R. K.</given-names></name></person-group> (<year>1994</year>). <article-title>Research self-efficacy, the research training environment, and research productivity among graduate students in counseling psychology</article-title>. <source>Counsel. Psychologist</source> <volume>22</volume>, <fpage>628</fpage>&#x02013;<lpage>641</lpage>. <pub-id pub-id-type="doi">10.1177/0011000094224008</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Phillips</surname> <given-names>J. C.</given-names></name> <name><surname>Szymanski</surname> <given-names>D. M.</given-names></name> <name><surname>Ozegovic</surname> <given-names>J. J.</given-names></name> <name><surname>Briggs-Phillips</surname> <given-names>M.</given-names></name></person-group> (<year>2004</year>). <article-title>Preliminary examination and measurement of the internship research training environment</article-title>. <source>J. Counsel. Psychol.</source> <volume>51</volume>, <fpage>240</fpage>&#x02013;<lpage>248</lpage>. <pub-id pub-id-type="doi">10.1037/0022-0167.51.2.240</pub-id><pub-id pub-id-type="pmid">23274077</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Putnick</surname> <given-names>D. L.</given-names></name> <name><surname>Bornstein</surname> <given-names>M. H.</given-names></name></person-group> (<year>2016</year>). <article-title>Measurement invariance conventions and reporting: The state of the art and future directions for psychological research</article-title>. <source>Develop. Rev.</source> <volume>41</volume>, <fpage>71</fpage>&#x02013;<lpage>90</lpage>. <pub-id pub-id-type="doi">10.1016/j.dr.2016.06.004</pub-id><pub-id pub-id-type="pmid">27942093</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><collab>R Core Team</collab></person-group>. (<year>2022</year>). <source>R: A Language and Environment for Statistical Computing</source>. Vienna: R Foundation for Statistical Computing. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.R-project.org/">https://www.R-project.org/</ext-link> (accessed December 2, 2022).</citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rammstedt</surname> <given-names>B.</given-names></name> <name><surname>John</surname> <given-names>O.</given-names></name></person-group> (<year>2005</year>). <article-title>Kurzversion des Big Five Inventory (BFI-K): Entwicklung und Validierung eines &#x000F6;konomischen Inventars zur Erfassung der f&#x000FC;nf Faktoren der Pers&#x000F6;nlichkeit</article-title>. <source>Diagnostica</source> <volume>51</volume>, <fpage>195</fpage>&#x02013;<lpage>206</lpage>. <pub-id pub-id-type="doi">10.1026/0012-1924.51.4.195</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Robitzsch</surname> <given-names>A.</given-names></name></person-group> (<year>2015</year>). <source>sirt: Supplementary Item Response Theory Models. R Package Version 1.8-9</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://cran.r-project.org/web/packages/sirt/">http://cran.r-project.org/web/packages/sirt/</ext-link> (accessed June 3, 2022).</citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rochnia</surname> <given-names>M.</given-names></name> <name><surname>Radisch</surname> <given-names>F.</given-names></name></person-group> (<year>2021</year>). <article-title>Die unver&#x000E4;nderliche Ver&#x000E4;nderbarkeit und der unterschiedliche Unterschied &#x02013; Varianz nachweisen mit Invarianz</article-title>. <source>Bildungsforschung</source> 2021, 2. <pub-id pub-id-type="doi">10.25539/bildungsforschun.v0i2.410</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rosseel</surname> <given-names>Y.</given-names></name></person-group> (<year>2012</year>). <article-title>lavaan: An R package for structural equation modeling</article-title>. <source>J. Statist. Softw</source>. <volume>48</volume>, <fpage>1</fpage>&#x02013;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.18637/jss.v048.i02</pub-id><pub-id pub-id-type="pmid">25601849</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Royalty</surname> <given-names>G. M.</given-names></name> <name><surname>Reising</surname> <given-names>G. N.</given-names></name></person-group> (<year>1986</year>). <article-title>The research training of counseling psychologists: What the professionals say</article-title>. <source>Couns. Psychol.</source> <volume>14</volume>, <fpage>49</fpage>&#x02013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1177/0011000086141005</pub-id><pub-id pub-id-type="pmid">33272065</pub-id></citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stajkovic</surname> <given-names>A. D.</given-names></name> <name><surname>Bandura</surname> <given-names>A.</given-names></name> <name><surname>Locke</surname> <given-names>E. A.</given-names></name> <name><surname>Lee</surname> <given-names>D.</given-names></name> <name><surname>Sergent</surname> <given-names>K.</given-names></name></person-group> (<year>2018</year>). <article-title>Test of three conceptual models of influence of the big five personality traits and self-efficacy on academic performance: A meta-analytic path-analysis</article-title>. <source>Personal. Individ. Diff.</source> <volume>120</volume>, <fpage>238</fpage>&#x02013;<lpage>245</lpage>. <pub-id pub-id-type="doi">10.1016/j.paid.2017.08.014</pub-id></citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tabachnick</surname> <given-names>B. G.</given-names></name> <name><surname>Fidell</surname> <given-names>L. S.</given-names></name></person-group> (<year>2013</year>). <source>Using Multivariate Statistics</source>. <publisher-loc>Boston, MA</publisher-loc>: <publisher-name>Pearson</publisher-name>.</citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vandenberg</surname> <given-names>R. J.</given-names></name> <name><surname>Lance</surname> <given-names>C. E.</given-names></name></person-group> (<year>2000</year>). <article-title>A review and systhesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research</article-title>. <source>Org. Res. Methods</source> <volume>2</volume>, <fpage>4</fpage>&#x02013;<lpage>69</lpage>. <pub-id pub-id-type="doi">10.1177/109442810031002</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wessels</surname> <given-names>I.</given-names></name> <name><surname>Rue&#x000DF;</surname> <given-names>J.</given-names></name> <name><surname>Gess</surname> <given-names>C.</given-names></name> <name><surname>Deicke</surname> <given-names>W.</given-names></name> <name><surname>Ziegler</surname> <given-names>M.</given-names></name></person-group> (<year>2021</year>). <article-title>Is research-based learning effective? Evidence from a pre&#x02013;post analysis in the social sciences</article-title>. <source>Stud. High. Educ.</source> <volume>46</volume>, <fpage>2595</fpage>&#x02013;<lpage>2609</lpage>. <pub-id pub-id-type="doi">10.1080/03075079.2020.1739014</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>H.</given-names></name> <name><surname>Seibert</surname> <given-names>S. E.</given-names></name> <name><surname>Lumpkin</surname> <given-names>G. T.</given-names></name></person-group> (<year>2010</year>). <article-title>The relationship of personality to entrepreneurial intentions and performance: A meta-analytic review</article-title>. <source>J. Manag.</source> <volume>36</volume>, <fpage>381</fpage>&#x02013;<lpage>404</lpage>. <pub-id pub-id-type="doi">10.1177/0149206309335187</pub-id></citation>
</ref>
</ref-list>


</back>
</article> 