A multilevel investigation of factors related to achievement in Ireland and Spain using PISA data

The Program for International Assessment (PISA) is a methodology for making comparative judgments about the quality of education systems. Celebrated by proponents as a transparent process that allows policy makers produce data informed judgments about relative quality of their national education system PISA – and through it the OECD – has become a key vehicle for informing and explaining educational policy development. This paper explores the Irish and Spanish outcomes of the 2018 round of PISA. It examines the contextual factors that are associated with performance at student and school level while at the same time developing a multi-level statistical model to explain divergent school performance profiles It finds that issues associated with the socio-economic level of the students, the repetition rate, and student age are common across all domains in both countries. It suggests that the socio-economic status of Spanish students at school level is not significant, that the shortage of teachers in Ireland affects student performance, and that immigrant status does not disadvantage Spanish student performance. It concludes by suggesting that studies involving a wider application of the model be undertaken to assess possible social, economic, and cultural causes that may explain the differences in variable significance in each country.

decades (Schleicher, 2019;Berliner, 2020;Seitzer et al., 2021).By providing robust data in key knowledge domains, society in general and policymakers are able, whether perceived or true, to make informed judgments as to the relative strengths and weaknesses of educational provision and to make changes to this provision that is data-informed and success oriented.However, given its systemic influence, it is perhaps unsurprising that there has been a longstanding critique of PISA in terms of both its procedures and its underlying assumptions (Grek, 2009;Ozga, 2012;Loveless, 2013;Hopfenbeck et al., 2018).Among the most consistent criticisms has been the linking of the PISA process to a fundamental reorientation of educational provision that sees it not as a public good in and of itself but rather as an arm of economic policy that seeks to maximize return on investment through the prioritization of certain forms of knowledge (Crossley and Watson, 2009;Lingard et al., 2015;Sjøberg, 2015;Zhao, 2018).Often linked to the political philosophy of 'neo-liberalism' , this critique argues that the comparative function of PISA has resulted not in an improvement of educational provision but rather a narrowing of how we assess the value of education and an increased 'datafication' of our understanding of what we understand educational quality to be (Grek, 2009(Grek, , 2021;;Brown et al., 2016;Jarke and Breiter, 2019;Bandola-Gill et al., 2021).
Notwithstanding the importance of the debate that has raged around PISA in scholarly circles, the fact remains that it has unrivaled influence as a source of comparative judgment about educational quality internationally.Policymakers, educators, journalists and the public regularly use the results of PISA and, more specifically, the tabular presentation of those results to compare education systems (Feniger and Lefstein, 2014).While there might be an argument as to the robustness of these comparative judgments, and indeed the OECD would argue strongly against this use of 'snapshot' judgments preferring a more nuanced exploration of trends in the data produced, the fact remains that this is a common and highly influential outcome of the PISA process (Sjøberg and Jenkins, 2022).However, these comparisons often miss a critical and in-depth exploration of the data sets underpinning the "league tables." Moreover, even when some aspects of this data are examined and discussed, common comparative analysis rarely explores the key underpinning concepts that dominate the discourse.Thus, important concepts such as the meaning of educational quality (Brown et al., 2016), the nature and validity of educational measurement (Moss, 1992) and the understanding of how both quality and measurement influence and are influenced by understandings of effectiveness across all stages of educational provision are rarely examined in single country let alone crosscountry studies.This paper seeks to address this lacuna by exploring the data that emerged from the most recent round of PISA testing in 2018 regarding educational provision in Ireland and Spain.More specifically, it seeks to explore the contextual factors that are associated with student performance at the student and school level while at the same time using a multi-level statistical model that seeks to explain school performance profiles in PISA 2018 that appear to differ from the norm.
The concept of educational quality has been at the center of many theoretical debates.Indeed, Drew and Healy (2006) are of the view that "quality has always been a particularly difficult concept to define, and many academics have struggled to provide the all-encompassing definition" (2006, p. 361).Despite this Kumar attempts to provide a working definition arguing that quality can have two meanings; the first is "the essential attribute with which something may be identified" (2010, p. 8; e.g., an institutional or systemic ethos) and second is the "rank of, or superiority of one thing over another" (e.g., league tables).
The definition of quality espoused by the PISA process is, in practice and general usage at least, located firmly within the second understanding as defined by Kumar insofar as it ranks education systems in comparison to each other across a set of predefined criteria.While clearly a methodological and conceptually coherent approach this does leave the process open to legitimate questions regarding both the narrowness of its definition and the manner in which it seeks to operationalize this definition.Even if we choose to ignore the issue of ethos suggested by Kumar (2010), definitions of quality used for comparison purposes that are shorn of ideas of context and culture, to take just one set of factors, have led to a widespread debate as to the value of comparative studies such as PISA.In this regard, secondary analyses of PISA data performed by independent educational researchers are of great value to shine a light on the great relevance of contextual factors on academic achievements.Some of these studies are the ones developed by different authors from different countries (Gamazo et al., 2017;Costa and Araújo, 2018;Bokhove et al., 2019;Gamazo and Martínez-Abad, 2020;Wu et al., 2020).
This link between context and comparison is at the heart of this paper and is explored in the next section.Drawing on the complete 2018 PISA data set for Ireland and Spain the authors will seek to provide a comparative overview of the performance of Irish and Spanish students across each of the PISA domains while at the same time using a statistical model to explain this performance in a contextual manner offering a judgment of school quality that relies not on raw data alone but on a nuanced engagement with the nature, characteristics and embedded nature of the school systems being compared.First however, a description of the methodology that was used in the study is provided.

Objectives
The main objective of this study is to determine which contextual factors at the student and school level are associated with the educational performance of Irish and Spanish students in each of the three skills of mathematics, science and reading literacy.

Instruments
The instruments from which we have obtained the data that serve as the basis for the comparison presented here about the variables associated with the academic performance of 15 and 16-year-old students in Spain and Ireland are the PISA 2018 questionnaires.Differentiating between the context questionnaires and the competency assessment booklets, i.e., the actual assessment tests that were administered to students is important in this context.In most countries, these tests were administered in a computerized manner.Moreover, the total duration was approximately 2 h and 35 min (2 h for the assessment tests and 35 min for the context questionnaire; Ministerio de Educación y Formación Profesional, Gobierno de España, 2019b).The context questionnaire "provides information about students, their attitudes, dispositions and perceptions; their homes and their experience of school and learning" (Ministerio de Educación y Formación Profesional, Gobierno de España, 2019b, p. 4).
In addition to the context questionnaire completed by students, data was also incorporated from school principals reporting on aspects of school management and the learning environment in their schools.
In the case of Spain, all these data were also complemented with other contextual data from teachers' answers to a specific questionnaire on their personal characteristics, as well as a series of data provided by the students' parents.
As we know, in each cycle of PISA emphasis is placed on a specific competence; in the 2018 edition, the main competence was Reading comprehension, with mathematics and science as secondary areas, and incorporating global competence as an innovator and financial competence as an international option (Ministerio de Educación y Formación Profesional, Gobierno de España, 2019a).
Regarding the construction of the assessment booklets, open questions and multiple-choice questions are included.Depending on the competence, different combinations of these two types of questions were presented.

Sample
The total population participating in PISA 2018 consisted of students aged between 15 years 3 months and 16 years 2 months from 79 participating countries.This results in the testing of around 600,000 students representing a total of 32 million students.With a focus on the target population for our analysis, the students and schools in Ireland and Spain that participated in the 2018 edition of PISA, consisted of 1,246 schools and 41,520 students.
Our sample consisted of schools with a minimum of 20 students and its students in line with previous research (Joaristi et al., 2014;Gamazo et al., 2017;Martínez Abad et al., 2017).In the case of Ireland, we refer to 155 schools and 5,551 students.In the case of Spain, we find 976 schools and 34,411 students that match this sampling strategy.

Variables
The variables that are included in the analysis are those that correspond to contextual factors (mainly socio-economic and demographic), and not to process issues (non-cognitive outcomes and student attitudes or issues related to the organization or pedagogical practices of schools).These variables are shown in Table 1 of which the coding given to each variable is shown in brackets.
To avoid the inclusion of variables with a high percentage of missing values, a prior analysis of missing values was carried out using the same criterion as Gamazo and Martínez-Abad (2020), p. 5 who state that "All variables with high levels of missing values (more than 80%) were removed".In other words, variables with missing values above 20% were not included in the analysis.
After performing this analysis, it was found that in the Spanish school database there is one variable with missing values above 20% (PROAT6) that was subsequently excluded from the multi-level modeling in Spain.For the remaining variables, with missing values below 20%, we estimated these values by applying the linear trend imputation model at the point.In the case of Ireland, the variable SCHTYPE, as no data is provided in the database.In the elaboration of the multi-level statistical models, we therefore included the variables indicated in Table 1, except for PROAT6 in the case of Spain and SCHTYPE in the case of Ireland.

Procedure and data analysis
To carry out this research we relied on a multi-level analysis methodology, continuing with the line of previous research in educational research (Aitkin and Longford, 1986;Raudenbush and Bryk, 1986;Goldstein, 1987;Gamazo García, 2019;Mang et al., 2021).As Gamazo indicates, "multi-level analysis is a statistical regression technique especially indicated for data sets in which there are observations nested in others of higher order" (2019, 114), in our case: students and schools.This methodology allows us to test both the influence of the different levels of analysis and the relationships between variables at these levels (Murillo, 2008), which is what is pursued in this research.
In conducting the multi-level study, we used HLM 7 software, which allowed us to include plausible values for each competency in the analyses, as well as sample weights for each level of study (students and schools).This allowed us to obtain less biased results in terms of variance and error estimation (Cai, 2013).In the case of the PISA tests, the sampling weights are indicated in the database itself.
Plausible values represent, in the words of Wu and Adams, "a representation of the range of abilities that can reasonably be assumed for a student" (Wu and Adams, 2002, p. 18).Sampling weights refer to the values that a student represents in relation to the total population.

Students Schools Teachers
Socio-economic status In PISA there are also sampling weights for schools, so that the sampling weight of a school will refer to the number of schools represented by that school based on its characteristics.As cited in the PISA Technical Report itself, sampling weights are necessary for the analysis of the data provided by this large-scale test in order to make valid estimates and unbiased population inferences (OECD, 2018).According to this report, "sampling weights must be incorporated to ensure that each participating student represents the correct number of students in the total PISA population" (OECD, 2018, p. 1).This, together with the use of plausible values as output of students' educational outcomes (Aparicio et al., 2021) constitutes the recommended statistical safeguards for the use of large-scale assessment databases.
In this case, there were two levels of analysis.On the one hand, the student level, whose data we collected from the student database provided by the OECD on PISA test scores in its 2018 iteration.On the other hand, the second level of our analysis concerned schools, whose data we obtained from the school database provided by the OECD also based on the 2018 PISA results and available in open access.Moreover, in the case of Spain, as we also had data referring to the personal and contextual characteristics of teachers, we included these data in the analyses, not as a third level, but as aggregated values according to the school to which the teachers belonged.
The modeling presents a mixed design of fixed slopes and random intercept (Gamazo et al., 2017;Ertem, 2021).The random intercept allows the dependent variable to have different values for each school.The fact that the values of the slopes are fixed implies that the effects of the different covariates included in the model remain fixed regardless of the school.
The next stage of the study involved the use of HLM 7.0 software in the following sequence: 1 Calculation of the null model and its corresponding Interclass Correlation Coefficient (ICC). 2 Construction of the models with the significant variables (p = 0.05).3 Calculation of the goodness of fit of the models.

Results
In any multi-level analysis it is necessary to check whether the data are nested by levels to assess the suitability of the methodology.This analysis of prior assumptions is done by calculating the ICC of the null model (a model composed only of the intercept, without any covariate), which reports the percentage of variance in the performance of the participating students that can be explained at the second level (schools).As a general rule, this percentage should be higher than 10% to be considered suitable for a multi-level analysis (Lee, 2000), and this is true for all skills in both countries (Table 2).
After checking this prior assumption, and applying the steps mentioned in the method section of this paper, in each of the three competences in both countries, we obtained six statistical models, the results of which are presented in Tables 3-5.

Competence in mathematics
The variables that made up the final model of mathematical competence are those included in Table 3. Regarding the weight of each of these variables in the statistical models, it was found that with no surprise, in both models the variable with the greatest positive influence on differences in student performance by school is socioeconomic status.That is, the higher the socio-economic status of students, the higher their probability of obtaining better results in the PISA tests in this skill.Within the Spanish model, this variable has more relative relevance, as can be seen from its higher value in the t-ratio.However, if we compare the coefficients, we can see how in Ireland the impact of 1 point of difference in the ESCS variable is greater with the difference in performance being around 20 points, while in Spain the difference in performance is approximately 13 points.
On the other hand, if we focus on the variables that have a negative influence on students' performance in mathematics, we find some differences here, as in the case of Spain this variable relates to the gender of the students, closely followed by grade repetition.In Ireland, it is this second variable, grade repetition, that has the strongest negative influence on performance in mathematical competence.

Competence in reading comprehension
In the case of reading comprehension competence, the variables that make up this multi-level statistical model are those shown in Table 4. Within this, reading literacy proficiency shows similarities between the two countries studied in five student-level variables.These variables are socio-economic level, gender, repetition rate, age, and the number of school changes of students.
In the case of Spain, the final model for this competence is made up of four school-level variables and seven student-level variables, while in Ireland the model is made up of a total of seven variables, one of them at the school level and the other ones at the student level.
If we refer to the variables with the greatest positive and negative influence in each of the models, we find that, in both cases, the variable with the greatest negative influence on student performance is repetition.In Spain, there is an average difference of 55 points in performance between repeaters and non-repeaters; in the case of Ireland, this difference is smaller, around 37 points.The variable with the greatest positive influence is, in both cases, the socio-economic level of the students.This means that the higher the grade repetition, the lower the students expected performance in the PISA tests and the higher the socio-economic status, the higher the probability of high student performance in the proficiency under study.Analyzing the variables that make up the statistical models of science proficiency for both countries, we find only three common variables with all of these referring to student-level characteristics.Within this, the Spanish model consists of 11 variables with 4 of them at school-level.In contrast, the Irish model is made up of 5 variables, only one of which belongs to the school-level.

Competence in science
Table 5 presents the variables that make up the final model of science proficiency for both countries.
If we analyze those variables with the greatest positive and negative influence in each model; in Spain the greatest positive influence on student performance in science is on the socio-economic level of the students, while the negative influence on performance in science is concentrated on student repetition.In Ireland, the variable with the strongest positive influence is also student socio-economic status.In contrast, the variable with the strongest negative influence is the fact that the student is a second-generation immigrant, but this is closely followed by student repetition.

Model goodness of fit summary
In order to check the fit of each of the models and in accordance with Raudenbush and Bryk (2002), we proceeded to calculate the Pseudo R 2 statistic, which is used to find out how much of the variance is explained at each level.To calculate this statistic for each of the levels, the variance components of the six models calculated (null and conditional for each competence) are used by applying the following equations (Equations 1, 2) expressed according to Hayes (2006).The results are reported as part of the Supplementary material.

Pseudo R Level Null Final
Null Where: δ 2 is the variability among level 1 units and.τ 00 is the variability among level 2 units.
Table 6 presents the values of the indicated statistic for each competence and at each of the two levels studied.
These Pseudo R 2 values for the first level of analysis studied for the three skills in both countries ranged from 7.96% for science proficiency in Ireland to 27.62% for mathematics proficiency in Spain.Analyzing the values of this statistic at the second level of analysis, results ranged from 64.8% for mathematics in Spain to 81.97% for reading literacy in Spain and Ireland.The values of the above statistic follow a certain trend, with similarities such as always obtaining a higher value at the second level than at the first level in all competences in both countries.Furthermore, comparing the results for Spain and Ireland, Ireland presents higher values of Pseudo R 2 at the second level, while Spain does so at the first level, obtaining much higher percentages of explained variance than the Irish.

Discussion
The aim of this research was to determine the contextual factors associated with the school performance of Spanish and Irish students in the three main competencies studied by the PISA tests to make a comparison between them.Applying multi-level modeling and starting from a series of variables based on the previous literature review, we obtained the final statistical models with the variables with a significant effect on each skill in each country, which allowed us to establish the similarities and differences between the different models, which have been presented previously in the results section of this study.
As has been shown, there are similarities in the patterns of the three competences both between Spain and Ireland and between the different competences in the same country.In fact, there are three variables that are repeated in all of them: the socio-economic level of the students, the repetition rate, and the age of the students.In line with previous studies where these are some factors that tend to appear more frequently with significant effects on students' academic performance (Lenkeit, 2012;Huang and Sebastian, 2015;Julià Cano, 2016;Sortkaer and Reimer, 2018), we observe that the three variables that are repeated in all the models have similar influences in each of the countries.Grade repetition has a negative influence on the performance of Spanish and Irish students in both mathematical competence, scientific competence and reading literacy, although with higher values in the case of Spain.In contrast, socio-economic status and the age of students have a positive influence on student performance in all three skills, with very similar values between the two countries (Tables 3-5).Furthermore, there are similarities between the factors that make up the statistical models of the different competences of the same country.In the case of Spain, the variables that coincide in the three models developed are follows: -At the school level, the shortage of teachers in the educational center and the size of the school, with the former variable having a positive influence on performance in the three competences, and the shortage of teachers having a negative influence.-At the student level, socio-economic status, grade, age, gender, grade repetition rate and number of school changes.In this case, for the first three variables there is a positive influence on performance in the three subjects studied.On the other hand, repetition and the number of school changes have a negative influence on students' performance in all three competences.The case of students' gender is striking, since according to our statistical models and in line with previous literature, Spanish female students perform better in reading, while in STEM subjects, boys perform better.In this case we can infer that in the case of Spanish students there are differences in academic performance based on gender.The results obtained are in line with previous research in the field that supports, above all, the influence of gender on mathematics performance in favor of boys (Ruiz de Miguel and Castro Morera, 2006;Ruiz de Miguel, 2009;Burger and Walk, 2016;Sortkaer and Reimer, 2018), as well as other studies that support our finding that girls perform better in reading comprehension (Shera, 2014;Julià Cano, 2016;Tan and Liu, 2018;Van Hek et al., 2018;Ertem, 2021).
In Ireland, the variables that coincide across the three models are: -At the school level, the average socio-economic status of students in the school, with values associated with higher performance in all three competencies the higher this index is.-At the student level, socio-economic status, student age, repetition rate and second-generation immigrant status are the variables with significant influence on the three competences.In this case, the first two of them lead to an increase in performance when the values are higher, while the last two are variables with a negative influence on the performance of Irish students.
These results are in line with previous research that studied variables related to students' academic performance using PISA test data from different countries, including Ireland, such as the study by Rodríguez-Santero and Gil Flores (2018), in which socio-economic status is highlighted as a variable with an influence on academic performance.These authors also indicate that "lower performance has been found in immigrant students than in native students" (2018,17), in line with other studies (Meunier, 2011;Martin et al., 2012).
Having described the main similarities and differences between the two countries' models, it would be useful to analyze what might be the causes at the social and education system level that can help to explain them.Fundamentally, there are three striking issues that emerge: in Spain, the socio-economic status of students at school level is not significant, in Ireland it does not seem that the shortage of teachers affects student performance, and in Spain immigrant status is not a disadvantage for student performance.
Starting with the first point, the difference in significance of the average socio-economic level of the student body, it is worth analyzing whether there are differences between countries in terms of the social composition of the schools, i.e., to what extent the schools are homogeneous or heterogeneous with respect to the socio-economic level of their student body.To this end, we calculated the initial ICC for the variable ESCS in both countries.The results showed that in Spain, the school level is accountable for 24.5% of the variance of ESCS, while in Ireland it is 19.3%.While the data are somewhat dissimilar, it is not clear that the difference is enough to warrant the difference in the multi-level models.In this regard, for future research studies it would be worth investigating in greater depth why schoollevel ESCS level is significant in Spain but not in Ireland.
Furthermore, it is also worth analyzing issues related to the data provided by the ICCs of the countries.Firstly, the initial ICC value of each model in each country gives us valuable information about the level of equity of the education systems in terms of differences between schools.The ICC tells us to what extent there are differences in student performance that are attributable to the second level of analysis, i.e., the school.Thus, the higher this value, the more differences there are between schools in a country, or, in other words, the more the performance of a particular pupil varies according to the school where he or she is enrolled.For this reason, it is understood that the lower the ICC, the more equitable the system can be, since the differences between schools are smaller.In the present case, and as can be seen in Table 2, it can be observed that, although the figures for the Spanish models are slightly lower, both countries show a similar level of variability in performance attributable to the influence of the school.
To compare the variance explained by the variables at each of the levels of analysis, we use the Pseudo R 2 statistic, as indicated above.From these values we can determine that, as a rule and even though a greater number of significant variables referring to student characteristics appear in the final models, these variables explain smaller proportions of variance than those explained by the second level variables.While the explained variance percentage is acceptable in most cases (Ozili, 2022), the differences between the countries point to a need to search for more student-level predictors in the case of Ireland, and more school-level predictors in the case of Spain, since their respective figures are lower than the other countries.
In conclusion, and notwithstanding the results obtained, this study has acknowledged limitations that should be considered when assessing its results.Firstly, it is worth highlighting those limitations that are inherent to the database from which the analysis was taken.Although the PISA database is very broad and allows a large number of analyses to be carried out both within each country and comparatively, as in the case of the present study, there is a need to bear in mind that some of its intrinsic characteristics, such as its crosssectional nature and the lack of data at the classroom level, are a limitation in terms of the scope of the results and the inferences that can be drawn from them.Furthermore, the present study is limited to the comparative analysis of two countries that are relatively close in terms of geographical location and cultural characteristics, so this may also be a limitation when considering aspects of the cultural, social, and economic macro-system that may have an impact on the results of the analysis.In this regard, with a view to future research, there are some issues that would be interesting to examine in greater depth to be able to interpret the results in a more meaningful and useful way 10.3389/feduc.2024.1306197Frontiers in Education 08 frontiersin.orgthat will help to improve the quality of education systems.Firstly, there is a need to study the possible social, economic, and cultural causes that may explain the differences in terms of which variables are significant in each country, to assess what possible measures may be appropriate to reduce their impact on student performance.Finally, it would also be of value to carry out an interaction analysis to check whether the variables that are significant in each of the models have any interaction relationship between them.This would help us to better understand the complex system of interconnection that exists between all the variables and to be able to establish measures aimed at mitigating their effects in a more informed and effective way.

TABLE 1
Variables for the study of factors associated with academic performance based on the results of PISA 2018.

TABLE 2
Intraclass correlation coefficient of the null model.

TABLE 3
Statistical multi-level final models of mathematical competence.

TABLE 6
Variance explained at each level.

TABLE 5
Statistical multi-level final models of science competence.