Teachers' Accuracy in Estimating Social Inclusion of Students With and Without Special Educational Needs

It is unclear to what extent teachers can accurately assess the social inclusion of their students with and without SEN. The study aims to shed light on these desiderata. Students (N = 1.644) with SEN (learning, behavior, and language problems) and without SEN and their teachers (N = 79) participated in the study. Sociometric peer nominations, students' self-perceived social inclusion, and teachers' assessments regarding students' social inclusion and self-perceived social inclusion were administered. The results suggest that teachers are moderately accurate in identifying social acceptance and social rejection, while accuracy is low when assessing students' self-perceived social inclusion. That said, rating accuracy varied strongly between teachers, ranging from no agreement to a perfect concordance. Teachers seem to be more accurate in estimating the social acceptance of students with learning problems. The results emphasize the importance of differentiating between various social inclusion criteria (i.e., students' self-report vs. peer nominations) and accounting for inter-individual differences in teachers' rating accuracy.


INTRODUCTION
Being part of a social community is a basic psychological need (Deci and Ryan, 1985). A positive social status and sense of social inclusion are also important conditions for a positive cognitive and social-emotional development of children and adolescents (Male, 2007;Rubin et al., 2015;Siegler et al., 2016). Besides this, numerous studies have shown that children with special educational needs (SEN) in inclusive classes are at higher risk of being excluded (Lindsay, 2007;Ruijs and Peetsma, 2009;Avramidis, 2012;Krull et al., 2014Krull et al., , 2018 and have fewer friendships compared to peers without SEN (Henke et al., 2017;Hoffmann et al., 2020). This puts students with SEN in jeopardy of a negative cognitive, emotional, and social development and poses a threat to the goals of the Convention on the Rights of Persons with Disabilities (CRPD), which claims that all people with disabilities should receive the support required to facilitate their effective education within an inclusive school setting (article 24, 3, The United Nations, 2006).
Various studies have shown that teachers influence the processes of social interaction and social judgement taking place within their classes (Huber et al., 2018;Wullschleger et al., 2020). They exert influence through one-to-one interactions with students and through classroom management by implementing social learning settings and seating arrangements (Gest and Rodkin, 2011). Research indicates that in classes where teachers provide a high level of emotional support (Gest and Rodkin, 2011;Hendrickx et al., 2016) and actively shape peer interactions (Gest and Rodkin, 2011), student friendships develop more frequently and positively.
The present paper does not try to answer which intervention and educational practices by teachers are most effective in mitigating social exclusion processes (for that, see Garrote et al., 2017;Huber, 2019). Instead, we address a necessary prerequisite for an effective intervention in social inclusion and exclusion: A valid estimation of the status of social exclusion and inclusion in a classroom and detailed knowledge about who is excluded. More precisely, we want to address the question of how wellteachers are able to estimate the status of their students' social inclusion and exclusion. Our main assumption here is that, prior to providing adequate support for the social inclusion of students, it is essential that teachers are familiar with the social structure within their class. We base this assumption on research showing that teachers' knowledge about students' proficiency in a specific subject is an essential prerequisite for effectively planning and implementing lessons and for supporting students individually (Südkamp and Praetorius, 2017). The ability to accurately assess a student's competence level has been termed diagnostic competences (Schrader, 2007;Artelt and Gräsel, 2009). Transferred to the topic of social inclusion, this would essentially be the ability to (a) perceive a valid and reliable picture of the social interactions among students within the classroom and (b) draw differentiated and accurate judgements about the specific characteristic of this social structure and the social inclusion of individual students.

What We Already Know About Teachers' Accuracy in Estimating the Social Inclusion of SEN Students
The research on teachers' accuracy (or diagnostic competence) in estimating the social inclusion of their students is fragmented and limited (Meyer and Ostrosky, 2018). Moreover, the terminology used to describe students' social situations is inconsistent (see Koster et al., 2009). In the following section, we try to follow the terminologies used in the respective studies. Therefore, we will speak of social participation, social inclusion, social exclusion, bullying, number of friendships, etc. We are aware that these concepts are not interchangeable but, due to the scarce number of studies, our goal is to review all the relevant literature. We focus on the extent to which accuracy depends on (1) the criteria of social inclusion, (2) students' SEN, and (3) contextual factors.

Criteria of Social Inclusion
We only found one study comparing teachers' rating accuracy on two different inclusion criteria. A study by Falkmer et al. (2012) suggests that teachers are better in estimating social participation than they are in estimating forms of social exclusion. They found that teachers were able to adequately estimate the self-perceived social participation of students with autism spectrum conditions, but there was little agreement when it came to students' selfperception of being bullied.

Students' SEN
Some studies show that teachers' ratings are less accurate for students with emotional and behavioral problems. A study by Wienke Totura et al. (2009) demonstrated that when students showed a higher level of moodiness, it was much harder for the teachers to tell whether or not the students were victims of bullying. Another study by Liau et al. (2004) found that teachers rated interpersonal violence in children less accurately when the children showed generally high levels of aggression. Similarly, a study comparing the ability of kindergarten teachers to identify friendships in children with and without disabilities (Meyer and Ostrosky, 2018) showed that the teachers performed less accurately for students with disabilities than for students without disabilities. Another finding also documents a lower rating accuracy of teachers in identifying peer relations of students with SEN (Shilshtein and Margalit, 2019): The correlation of children's self-reports and teachers' estimations of peer acceptance was lower in the group of students with learning disabilities than for those without SEN. In contrast, Pearl et al. (2007) found a higher rating accuracy for students with emotional and behavioral problems: Boys whose social network participation was estimated correctly showed a higher level of externalizing behavior (aggressive and troublemaking behavior) than boys who were estimated incorrectly. For girls, rating accuracy was higher when they had a higher level of internalizing behavior (social withdrawal and depression).
The studies reported up to this point examined teachers' accuracy by contrasting their ratings for students with and without SEN. Studies conducted in inclusive classrooms where teachers only estimated social interactions for students with SEN concluded that teachers are overly positive about the frequency of peer relations and tend to overestimate the social inclusion of students and the number of their friendships (Monchy et al., 2004;Koster et al., 2007;Pijl et al., 2008). A study by Monchy et al. (2004) showed that while teachers generally accurately estimated the number of friendships of students with SEN, they misinterpreted their sociometric status, (e.g., they frequently miscategorized students who were actually rejected by their peers as sociometrically average). These results have been replicated for preschool children with various disabilities (including developmental delays, autism spectrum disorders, and language impairments) in a study by Ferreira et al. (2017). Schwab et al. (2019) report concurrent results for hearingimpaired students: Although students with hearing impairments felt less socially integrated and less accepted by their peers, their teachers evaluated their social situation more positively.

Contextual Factors
We only found scattered studies addressing how contextual factors moderate teachers' rating accuracy of their students' social inclusion. It should be noted that the studies described below do not focus on students with SEN. Gest (2006) as well as Neal et al. (2011) indicated a positive relationship between students' academic year and the teacher's accuracy in estimating their social inclusion. In contrast, Harks and Hannover (2017) reported more accurate teacher ratings for the lower grades. Studies on the influence of class size showed that teachers' rating accuracy of social inclusion was lower for large classes (Ahn et al., 2013;Harks and Hannover, 2017;Marucci et al., 2018). Furthermore, the amount of time teachers spent with their students also had a positive influence on their accuracy (Harks and Hannover, 2017;Marucci et al., 2018).

Individual Differences in Teachers' Rating Accuracy
In our literature review, we also wanted to look into individual differences in teachers' ability to accurately judge the social inclusion of their students. We did not find a study addressing this aspect in the context of social inclusion. Some studies from related areas suggested a large variability in teachers' rating accuracy. In a literature review on teachers' accuracy in rating academic performance, Hoge and Coladarci (1989) reported considerable differences between teachers. A more recent study by (Gabriele et al., 2016) also found rating accuracies for mathematical performance ranging between low and high (hitrate scores between 0.33 and 0.93). Similar results were found for teachers' accuracy in rating students' motivation  and students' goal setting (Dicke et al., 2012).

What Remains Unclear
It remains unclear to what extent teachers can accurately estimate the social inclusion and exclusion of students with and without SEN. Studies on this topic are rare. On top of that, many studies do not include a group of students without SEN, so little can be said about whether teachers are more sensitive to peer relations of students with SEN compared to those of students without SEN. In addition, the above-summarized studies do not systematically differentiate between various types of SEN (e.g., emotionalbehavioral disorders, learning problems, language development problems). These studies either include one specific kind of SEN, or all students with SEN are put into one category. Hence, the question remains unanswered as to whether teachers are better able to identify peer relations of children with specific types of SEN.
Furthermore, it is unclear whether teachers rate the social inclusion of SEN students overly positively compared to students without SEN. Most studies suggesting such a connection only include students with SEN but not both groups (SEN and non-SEN students).
Previous research has inconsistently operationalized social participation including sociometric approaches asking about the most liked and disliked peers as well as questionnaire-based self-reports on social participation. It is unclear to what extend these measurement differences could explain varying results and inconsistencies between studies (Kulawiak et al., 2020). Similarly, although students' gender and academic year seem to be related to teachers' rating accuracy, these factors have not been systematically included in previous studies.
Finally, we can assume that teachers differ with respect to the accuracy of social-inclusion ratings. This intergroup variability has not been addressed, but insight on this is a prerequisite for future research into what improves and deteriorates rating accuracy.

Research Questions
The present study is exploratory and tries to shed light on the above-described desiderata. Our research questions are the following: Q1: To what extent are teachers accurate in estimating their students' social inclusion?
Q2: To what extent do teachers vary in their rating accuracy? Q3: Are the results for rating accuracy consistent across different social inclusion criteria (i.e., the degree to which students are accepted or rejected by their peers and their selfperceived social inclusion)?
Q4: Is teachers' rating accuracy higher or lower for students with SEN (i.e., learning problems, behavioral problems, and language problems) compared to students without SEN?
We think that previous research results have not been consistent enough, nor can we draw clear predictions from theoretical models to state explicit hypotheses at this juncture. Finally, it is important to take into account students' academic year and gender as contextual factors.

Special Educational Needs (SEN)
To identify children with and without SEN, we asked classroom teachers to indicate for each student of their class the area in which they were diagnosed with a SEN. In total, they could choose between the following seven categories of support: Learning, emotional and behavioral development, language, intellectual development, physical and motor development, hearing and communication, and vision (multiple choices were possible). In addition, the teachers were asked to indicate in which categories each child required increased support (regardless of whether a SEN had been diagnosed). This two-step approach was necessary because, due to an inclusive educational approach, administrative ascriptive diagnostic procedures have been avoided or suspended in the German primary education system. The responses on diagnosed and additional SEN were condensed into one variable for each SEN category indicating whether a diagnosed or additional SEN was prevalent in that category. Then, five new categories were calculated: Students without SEN, students with SEN in learning but in no other category (learning problems), students with SEN in emotional and behavioral development but in no other category (behavior problems), students with SEN in language but in no other category (language problems), and students with multiple SEN or SEN in a category other than learning, behavior, or language (miscellaneous SEN).

Students' Social Acceptance and Rejection by Their Classmates
To evaluate social acceptance and rejection by classmates, a sociometric nomination questionnaire (Moreno, 1996;Cillessen, 1999;Bukowski and Cillessen, 2012) was used. All students were asked to write down the names of the classmates whom they liked the most (social acceptance) and whom they liked the least (social rejection). The number of nominations was unlimited. The children were not allowed to nominate themselves, and answers such as "all girls" or "all boys" were not valid. Students' social acceptance and rejection were calculated by the votes they received on the respective questions (indegrees).

Students' Self-Perceived Social Inclusion
To assess self-perceived social inclusion, a shortened version of the subscale "social inclusion" (6 items instead of 11) from the FEESS questionnaire (German acronym for "questionnaire for assessment of emotional and social school experiences") (Rauer and Schuck, 2003) was administered (example items: "My classmates are nice to me" and "I get along well with my classmates"). Participants in the first and second grades had to assess whether or not they agreed with the statements. Third and fourth graders had to indicate the extent to which the statements applied to them on a four-point Likert scale ("strongly disagree, " "disagree, " "agree, " "strongly agree"). To create a coherent dataset, each response on a four-point scale was aggregated to a two-point scale ("agree" and "disagree"). The internal consistency of the social inclusion subscale based on the data of the present study was Cronbach's α = 0.66.

Teachers' Assessment of Student's Social Inclusion
The class teachers were asked to assess the social inclusion of each student with three questions corresponding to the three student measures.
For students' social acceptance, they were given the question "In a sociometric questionnaire, your students will be asked which classmates they particularly like. How often do you think this student is selected by the other children in the class?." They could give their answers on a five-point Likert scale ("never, " "seldom, " "sometimes, " "often, " and "very often"). We will address this variable as teacher rating social acceptance.
For students' social rejection, the question was "Furthermore, the students will be asked which classmates they do not like. How often do you think this student is selected by the other children in the class?." The same Likert scale was provided ("never, " "seldom, " "sometimes, " "often, " and "very often"). We will name this variable teacher rating social rejection.
For students' self-perceived social inclusion, the question was "The students will also be asked how much they feel socially integrated into their class. How much do you think your student feels socially integrated in his/her class?." Responses could be given on a five-point Likert scale ("not at all, " "a little, " "moderately, " "mostly, " and "completely"). We will address this variable as teacher rating self-perceived social inclusion. A student could belong to more than one category.

Participants
The present study is part of a German 4-year research project (see Hennemann et al., 2018;Urton et al., 2018). Data were collected in nine inclusive primary schools in an urban district in the federal state of North Rhine-Westphalia, Germany, in 2018. The original sample included 2,020 students and their 86 class teachers. In order to create a coherent sample, teachers and their students were excluded from the sample when < 10 valid ratings were available in that class. A rating was considered as valid when the teacher rated a student's social inclusion, and corresponding data for that student (social acceptance and rejection or self-perceived social inclusion) were available. This procedure resulted in a sample of 79 teachers (median age category: 41-50 years, 92% female, median time working as a teacher category: 4-10 years) with 1,644 students. The students were between 6 and 12 years old and attended grades one to four. The number of students (Min = 357; Max = 442) and the gender ratio (Min = 47.1 %; Max = 50.1 %) were approximately evenly distributed across the grades ( Table 1). The percentage of children with a specific type of SEN is depicted in Table 2 (one student could belong to more than one category). Behavior (8.76%), learning (8.52%), and language (6.27%) were most frequent, while intellectual (0.18%) and vision (0.24%) were the least frequent types of SEN. This distribution is in line with the federal states' policy of primarily including students who are struggling with learning, behavior, or language into mainstream schools.

Procedure
Data collection took place from February to April 2018 (from the beginning until the middle of the second school semester). Graduate and undergraduate students working in dyads collected the sociometric data as well as the FEESS data. A standardized data collection script was provided and students were trained in data collection. All children from the second to fourth grades filled out both questionnaires within 45 min in the classroom unless (in the teachers' opinion) they needed special support in answering the questions. Most of the first graders were interviewed in a one-on-one interview (20 min) in a separate room, due to their insufficient reading and writing skills. At the same time as interviewing the children, the responsible classroom teachers filled out a 10-min questionnaire for each student in the class. The study was approved by the education authority of the district (approval criteria: compliance with data protection regulations and educational relevance of research) and all participating children had a declaration of consent from their parents or legal guardians. Additional ethics approval was not required in accordance with the national legislation and institutional requirements.

Descriptive Statistics
After ascribing a SEN category to each student as described in the materials and measures section, around 20% of the students belonged to a SEN category ( Table 3). Behavioral problems (6.1%) and learning problems (5.5%) had about the same prevalence, followed by language problems (4.5%). Four percent of all students belonged to the miscellaneous category, which comprises all students with a combination of several SEN or students with physical, intellectual, hearing, or vision problems.

Teachers' Rating Accuracy (Research Questions Q1/Q3)
In the first step, we calculated a correlation between teacher ratings and students' attributes for all three criteria of social inclusion, disregarding the nested data structure (see Table 4). For social acceptance, the correlation was medium-sized (r = 0.38, p < 0.001) and tended to be large for social rejection (r = 0.47, p < 0.001). For self-perceived social inclusion, the correlation was small to medium (r = 0.29, p < 0.001).
For a more detailed insight into the rating accuracy, we plotted the distribution of students' attributes against each level of teachers' ratings (ranging from 0 to 4; see Figure 1). For social acceptance and social rejection, we rescaled students' indegrees to a percentage value (i.e., percentage of peers in the class), where 100% indicated that a student was chosen (social acceptance) or rejected (social rejection) by all peers in that class.

Social Acceptance
Students receiving the highest rating (4: "very often") had a median of 47% peer ratings, whereas students receiving the lowest rating (0: "never") had a median of 12%. The distribution spreads considerably: social acceptance values of students rated 0 actually ranged between 0 and 50% and students rated 4 ranged between 7 and 90%. The overlaps between the five distributions were strong, indicating a low degree of differentiation between teachers' rating levels. The medians were close to the regression line (except for teacher rating level 0, which is a bit below), indicating a linear relation between teachers' ratings and students' indegrees ( Figure 1A).

Social Rejection
The picture is quite similar here: Students receiving the highest rating (4) had a median of 60% peer ratings, whereas students receiving the lowest rating (0) had a median of 13%. Again, the distributions spread considerably (for teacher rating 0, between 0 and 72% and for teacher rating 4, between 26 and 93%). The regression line indicates a linear relation from categories 0 to 3, but for category 4 the values are above the line. This indicates that a much higher increase in social rejection is necessary to get from categories 3 to 4 than compared to the transition from categories 0 to 1, 1 to 2, and 2 to 3. That is, only students with a proportionally high amount of social rejection were rated as "very often" socially rejected ( Figure 1B).

Self-Perceived Social Inclusion
When teachers rated the lowest category (0: "not at all"), the selfperceived social inclusion values ranged from Z = −3.2 to Z = 0.2 with a median of Z = −1.9. When they rated the highest category (4: "completely"), values ranged between Z = −3.2 and Z = 0.8 with a median of Z = 0.8. That is, many students with below-average self-perceived social inclusion were rated by the teacher as having a positive or very positive self-perception, resulting in a ceiling effect for the highest category. The median of category 0 was much lower than estimated by the regression line and for category 4 the median was above the regression line, indicating a non-linear relation. This indicates that only students with proportionally very low self-perceived social inclusion were ranked by their teachers into the lowest category ( Figure 1C).

Differences Between Teachers in Rating Accuracy (Research Questions Q2/Q3)
We calculated correlation coefficients for all three social inclusion criteria for each teacher (the correlation of a teacher's ratings of social inclusion with the scores derived from students' measures). Table 5 shows statistical indices for these correlations (correlations were not Fisher-Z transformed): The mean values were close to the values reported above (this time they were weighted for teachers), with medium to large correlations for social acceptance (r = 0.53, p < 0.001) and social rejection (r = 0.55, p < 0.001). The mean values for self-perceived social inclusion, again, were small to medium-sized (r = 0.29, p <

FIGURE 1 | (A-C)
Concordance between teachers' ratings (from 0 to 4) and students' attributes (in percentage of indegrees received from participating class peers for the sociometric measures and Z-values for the self-perceived social-inclusion). The horizontal lines within the boxplots indicate the median, the lower and upper hinges correspond to the first and third quantiles, the whiskers extend to the largest and lowest values, but not further than 1.5 times the interquartile range (difference between third and first quantiles) from the respective hinge. All values outside the whiskers are depicted as dots and considered to be outliers. The diagonal lines depict regression lines with a gray area indicating the standard errors.
0.001), indicating a weak concordance here. The variability was very high for all three criteria (minima between r = −1.0, p < 0.001, and r = −0.02, p = 0.95 and maxima between r = 0.86, p < 0.001, and r = 0.92, p < 0.001), indicating considerable differences between teachers. The exact distribution of correlations is depicted in Figure 2. FIGURE 2 | Distribution of correlations between teacher ratings and students' attributes for three criteria of social inclusion. For an explanation of the boxplots, see Figure 1.

Rating Accuracy and SEN (Research Questions Q4/Q3)
To estimate the influence of SEN on rating accuracy, we set up three multilevel regression models. Model 1 predicted students' social acceptance values, model 2 the social rejection values, and model 3 students' self-perceived social inclusion values (see Table 6). Controlling predictor variables were the class grade, gender, and class size (the number of students per class who answered the sociometric questions). The four SEN categories were included as dummy predictors (without SEN as the reference category). For each model, the corresponding teacher rating was included: Teacher rating social acceptance for model 1, teacher rating social rejection for model 2, and teacher rating self-perceived social inclusion for model 3. Furthermore, the interactions of each SEN category, gender, class grade, and class size with the respective teacher rating variable was included. All predictor and criteria variables were standardized except for the categorical gender variable, which was Helmert contrasted (−1 for female and 1 for male), and the SEN categories, which were dummy-coded. Students were nested in classes. Therefore, the class identifier was included as a random factor. The analyses were conducted with the R Package lmer (Bates et al., 2015;R Core Team, 2020).

Social Acceptance
First, we start with the general effects on social acceptance and then continue with the results for the accuracy of the teacher ratings. Class grade was a positive predictor of students' social acceptance (B = 0.19, p < 0.001), while we found no significant gender differences. With regard to class size, students received significantly more sociometric nominations in larger classes (B = 0.68, p < 0.001). Students with learning problems (B = −0.20, p = 0.030) and students with behavioral problems (B = −0.25, p = 0.006) were significantly less socially accepted. The regression weights for language problems and miscellaneous SEN were not significant. Teacher ratings of social acceptance were significantly correlated with students' actual social acceptance (B = 0.36, p < 0.001). Class grade and gender did not significantly moderate this correlation, but teacher ratings were more accurate with increasing class size (B = 0.12, p < 0.001). With respect to SEN, ratings were more accurate for students with learning problems (B = 0.18, p = 0.039) and for students with miscellaneous Frontiers in Education | www.frontiersin.org SEN (B = 0.31, p = 0.001). Students' behavioral problems and language problems were not significant predictors of teachers' rating accuracy.

Social Rejection
Class grade did not significantly predict students' social rejection. Gender (B = 0.12, p < 0.001) and class size (B = 0.47, p < 0.001) were significant predictors. Students with learning problems (B = 0.32, p < 0.001), behavioral problems (B = 0.39, p < 0.001), and miscellaneous SEN (B = 0.29, p < 0.001) were significantly more socially rejected, while language problems were not a significant predictor. Teacher ratings of social rejection were significantly correlated with students' actual social rejection (B = 0.41, p < 0.001). The correlation was significantly increased by class grade (B = 0.05, p = 0.033) and class size (B = 0.18, p < 0.001), while students' gender did not show a significant moderation. Neither of the SEN categories were significant predictors of teachers' rating accuracy.

Self-Perceived Social Inclusion
Class grade significantly predicted students' self-perceived social inclusion, while students' gender showed no significant prediction. Class size was only included in the two prior models, because the number of sociometric choices had to be controlled for the number of nominees. Students with learning problems (B = −0.33, p = 0.007) perceived their social inclusion to be significantly lower. The other SEN categories did not significantly predict self-perceived social inclusion (although the regression weight of students with miscellaneous SEN was of considerable size: B = −0.27, p = 0.060). Teachers ratings were significantly correlated to students' self-perceptions (B = 0.30, p < 0.001).
The correlation was somewhat lower for male students (B = −0.07, p = 0.006) and somewhat higher with increasing class grade (B = 0.09, p = 0.004). None of the SEN categories showed a significant interaction with teachers' rating accuracy for students' self-perceived social inclusion.

Summary and Interpretation
The social integration of students with SEN is an important indicator of a successful inclusive school system (Artiles et al., 2006). Whether this can be achieved depends, to a large extent, on teachers' behavior (Farmer et al., 2011;Gest and Rodkin, 2011;Hendrickx et al., 2016). Adequate support for social inclusion processes is preceded by teachers' perceptions of the social processes within the class. Accordingly, the aim of the present study was to examine the extent to which teachers are able to assess the social inclusion of their students. First, in line with previous studies (Lindsay, 2007;Ruijs and Peetsma, 2009;Krull et al., 2014Krull et al., , 2018, our investigation shows that students with learning and behavioral problems are less often accepted and more often rejected by their classmates compared to other students. Moreover, students with learning problems do not feel socially integrated to the same extent as their peers. Considering the ability to which teachers are able to assess the social inclusion of their students in terms of different criteria of social inclusion (research questions Q1/Q3), our results show that they are similarly accurate in assessing students' social acceptance and social rejection status and less accurate in estimating students' self-perceived social inclusion. We also find a slightly lower rating accuracy for social acceptance than previous studies (Ahn and Rodkin, 2014;Südkamp et al., 2018). Overall, teachers' rating accuracy regarding their students' social inclusion is moderate and somewhat lower than that for students' academic performance (see Hoge and Coladarci, 1989;Südkamp et al., 2012).
Furthermore, our results reveal a very high degree of variability in teachers' assessment accuracy (research questions Q2/Q3), similar to studies on teachers' rating accuracy in other areas (Dicke et al., 2012;Gabriele et al., 2016;Praetorius et al., 2017). This is particularly pronounced for the assessment of student-perceived social inclusion. This indicates considerable differences in teachers' diagnostic competence, which can be related to differences in teachers' information processing capacity or differences in teachers' judgement criteria (van Ophuysen and Behrmann, 2015).
Our results indicate that teachers' rating accuracy varies depending on students' SEN (research questions Q4/Q2). The pattern here is complex: The social acceptance of students with learning problems and miscellaneous SEN was rated more accurately, while for social rejection and self-perceived social inclusion no such effects were present. Moreover, no differences in rating accuracy in any of the three criteria of social inclusion could be found for students with behavioral or language problems, which is in contrast to a study by Pearl et al. (2007) that showed a higher accuracy for students with behavioral problems.
Overall, SEN has only a weak influence on teachers' rating accuracy. Moreover, our results indicate the need to differentiate between several types of SEN (O'Mara et al., 2012), as well as to include various operationalizations of social inclusion. Finally, our study shows that teachers in higher class grades are a bit better in estimating the social inclusion status of their students. With respect to gender, teachers seem to be better in estimating the self-perceived social inclusion of female students. Both results stress the importance of including these moderators when analyzing teachers' rating accuracy.

Limitations
The results of the present study must be interpreted with some reservations. SEN status was not diagnosed through a standardized instrument, but was estimated by teachers or based on the official SEN assessment process as conducted in Germany. This might be particularly critical in terms of language problems, as students with German as a second language may have been wrongly assigned a SEN for language. Secondly, the selfperceived social inclusion scale had low internal consistency. This might account for the lower teacher rating accuracy (as well as the low conditional R² of model 3 in Table 6). Thirdly, a teachers' rating of a student's self-perceived social inclusion was based on one single item. A more reliable estimation could probably be achieved with a multi-item scale (Südkamp et al., 2018).

Implications and Further Research
The Realistic Accuracy Model (Funder, 2012) states that the accuracy of a diagnostic judgement is influenced by three sets of features: (a) characteristics of the criteria to be observed, (b) characteristics of the observer, and (c) characteristics of the observed person. Thus, future studies should (a) precisely define and identify which aspect of social inclusion they are addressing and how these can be best operationalized in a classroom, (b) which abilities and characteristics of teachers influence their rating accuracy, and (c) which characteristics of students are correlated with a higher (or lower) degree of rating accuracy.
We think it is particularly important to investigate the high variability of rating accuracy between teachers. Identifying the competences a teacher needs to successfully detect social exclusion processes in their classroom will help to successfully teach these skills and competence areas to prospective teachers during their academic education. This, in turn, will help these future teachers implement a classroom climate in which all students receive the support required to facilitate their effective education within an inclusive school setting.

DATA AVAILABILITY STATEMENT
The data set cannot be made publicly available because informed consent from study participants did not cover public deposition of data. However, the minimal data set underlying the findings presented in this article is archived and will be made available by the authors for all interested researchers.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
JW, KU, JK, PK, and AS conceptualized and wrote the manuscript. JK, KU, JW, and TH planned and organized the data collection. JW conducted the statistical analyses and created the figures and tables. All authors contributed to the article and approved the submitted version.