Scale to measure student perception in collaborative online international learning experiences: design and validation

century. Moreover, it can potentially address global educational needs that have not been fulﬁlled in the past. Methods: This study conducted a content validation of a scale (instrument) intending to measure students’ perception of collaborative online international learning experiences (COIL) in three dimensions: (a) interaction among students in digital environments, (b) collaborative work in multicultural teams to achieve goals, and (c) peer reﬂection on differences and similarities during collaboration. The study employed the Delphi method of expert judgment. Results: In the overall scale, Aiken’s V values indicated that the clarity criterion did not attain an acceptable score. Therefore, a review is desired to determine which instrument items need reformulation. However, Aiken’s V scores met acceptable coherence, relevance, and sufficiency values. Conclusion: The proposed scale contributes to research on collaborative online international learning experiences, serving as a valuable tool for future investigations, particularly those focused on measurement, and as a reference for evaluating COIL experiences among students.


Introduction
Formal and informal online education is becoming increasingly common due to the diverse options available on the internet and the efforts governments are making to achieve the fourth Sustainable Development Goal: quality education.Focusing on formal higher education, governments act significantly to ensure quality and widespread education in their countries.This is evidenced by the fact that from 1970 to 2020, enrollment in higher education institutions increased yearly worldwide (World bank, 2024).Confirming this, Mok and Marginson (2021) also reported that East Asia's enrollment attained over 50 million students in 2018, consistent with the global acceleration and the trends of massification, diversification, and internationalization.These figures do not distinguish learning modalities (face-toface, distance, blended, or virtual) (Imran et al., 2023).However, focusing on non-formal online education, it can be affirmed that what (Noam, 1995) predicted in the 1990s has come true: with e-learning, universities would not be the protagonists, but commercial enterprises would provide sophisticated courses on the web.The data show that many higher education institutions have created virtual or online programs, extending coverage to previously unattainable contexts.The online programs intend to create memorable digital experiences for each learner through active, student-centered methodologies.
Although there are still barriers and gaps in internet access, the proliferation of platforms offering courses, micro-courses, certifications, Massive Open Online Courses (MOOCs), and academic information through various channels and media is diverse.Some are backed by academic institutions, others by the instructor's reputation, and others align with current technological trends, all advocating for educational innovation that meets the training needs of the current moment.Indeed, the effectiveness of these online learning offerings is not primarily determined by the factors mentioned earlier.Instead, the effectiveness of elearning primarily requires collaboration and student involvement with the proposed activities and strategies (Alyoussef, 2023).Along these lines, research by Baloran et al. (2021) and Elshami et al. (2022) emphasize that student engagement is one of the main challenges of e-learning.While there is generally a high level of student satisfaction with courses developed using elearning, the concern lies in achieving student participation in these complex learning processes.Furthermore, several studies have found that participation indicates student engagement and correlates to performance and satisfaction (Greller et al., 2017;Sinclair et al., 2017;Rajabalee et al., 2020).On the other hand, some perceptions about e-learning assert that online learning is merely the use of technologies for information dissemination (Tajik and Vahedi, 2021).The issue is that e-learning is as broad as face-to-face education; the teacher's intention to adopt strategies employing digital technologies that support and enhance planned contextual activities is critical.Other determinant elements include connectivity, accessibility, usability, and required skills.

. Conceptualizing COIL
COIL experiences are a digital educational alternative that connects two or more educational institutions in a studentcentered online learning environment to address, understand, or discuss a global issue of interest to the parties.Some research has labeled it a pedagogy that creates environments for developing intercultural skills and competencies using technologies connecting two classrooms in different geographical areas (Appiah-Kubi and Annan, 2020;Zapatero et al., 2022).
On the other hand, Davis et al. (2023) define COIL as an innovative pedagogical approach used worldwide, providing teachers and students with the opportunity to engage in a global learning activity where they learn from and about each other (Marcillo-Gómez and Desilus, 2016).In addition to the predominant focus on interculturality and internationalization, it is also an innovative and cost-effective pedagogical tool that offers global learning opportunities from home (Vahed and Rodriguez, 2020;Liu and Shirley, 2021).Notably, COIL experiences have different conceptions, but all are closely related to internationalization, global learning, internet use, collaboration, and interculturality.

. COIL and competencies for the digital era
Up to this point, it can be observed that digital education is still under construction.Although it has been massified, ongoing inquiry into its results and implications continues.Various actors are involved in creation, production, and delivery.However, universities and higher education institutions are the foremost correspondents, as they have responded to the needs of different eras throughout history.Current education must integrate interculturality, address problems beyond the classroom, pay attention to digital nomads, innovate with tangible solutions that benefit communities and society as a whole through digital means, and meet specific demands (Tajpour and Hosseini, 2021;Hajimiri et al., 2023;Mukul and Büyüközkan, 2023).collaborative online international learning (COIL) courses address several of the abovementioned elements and serve as a strategy for active digital learning that vigorously fosters competencies in students who benefit from the dynamics of these experiences (Watla-iad and Kradtap Hartwell, 2022).This approach uses digital media to foster synchronous and asynchronous online collaboration among students and teachers in different geographical contexts.It creates equitable, innovative learning environments that align with 21stcentury classrooms, enhancing cultural sensitivity and improving learning experiences (Borger, 2022).The strategy provided here is inherent to the digital age but requires further research and evidence to continue optimal and effective implementation.
COIL experiences are increasingly being implemented in universities for various reasons beyond internationalization or interculturality.They were not only used during the COVID-19 pandemic, although that situation gave them momentum at the time because it was a means to address internationalization processes without leaving home; they have been utilized for over two decades.They currently emerge as a coherent educational solution to the digital environments presented.
The nature of COIL experiences renders some competencies imminent and natural, such as interculturality, which typically involves two groups of students from educational institutions in different countries (West et al., 2022;Hackett et al., 2023).Other inherent competencies include strengthening a second or foreign language, in some cases English (Pouromid, 2019;Helena and Alena, 2021).However, other less direct competencies have been identified, such as job training (Nethsinghe et al., 2023).This competency may not be perceived in all studies or experiences, as it depends on the intention of the planned COIL.However, most COIL courses are designed in an initial phase of discussion and argumentation about a specific case, situation, or problem (Ingram et al., 2021;House et al., 2022).Some studies provide evidence of a strategy employing a real challenge or problem that promotes contextualized interdisciplinary work and requires a possible solution.In such cases, student involvement is greater, and the bond is very close; therefore, the competencies and skills developed differ.The content tends to be global and proposes complex situations (Suarez and Michalska Haduch, 2020;Salmon et al., 2022).As can be observed, studies report the benefits of developing various competencies that participants positively perceive.However, validated instruments that measure students' perception of COIL experiences in the three dimensions illustrated in Figure 1 are not found (Goto and Gutierrez-Gomez, 2024; Mestre-Segarra and Ruiz-Garrido, 2022; Wimpenny and Orsini-Jones, 2020).

. . Student interaction in digital environments
The construction of learning through social interactions allows individuals to build their knowledge based on their experiences and the use of digital media, enabling synchronous or asynchronous collaboration, communication, and creation in previously unimagined spaces (SUNY COIL CENTER, 2013;Weller et al., 2020).

. . Collaborative work in multicultural teams to achieve objectives
The classroom is conceived as a community of individuals who, regardless of their culture and language, engage collaboratively and enrich discussions and dialogues from their respective viewpoints.This facilitates understanding the topics or issues addressed with a broader and more diverse perspective, positively impacting learning outcomes and skills development (SUNY COIL CENTER, 2013).

. . Reflection on di erences and similarities with peers during collaboration
This has a student-centered focus stemming from involvement and active roles in these courses, where the intercultural exchange is facilitated through activities that allow for a broader global dimension of course content, fostering reflection among all involved stakeholders (SUNY COIL CENTER, 2013).
This brief conceptualization of the dimensions suggests that COIL experiences profoundly affect students' comprehensive and generic understanding; they are the foundations upon which these courses are built.For this reason, a questionnaire-type instrument was designed with the purpose of being used in various COIL courses to effectively measure the above-described constructs.
The aim was to collect data that allows for the systematization of experiences and potentially generate a theory postulating the principles upon which COIL experiences should remain grounded.

Method . Instrument design
For the design of the instrument, a literature review was conducted to determine which would be the appropriate dimensions to include in the scale, for this purpose, different proposals of measurement instruments developed to measure COIL experiences in higher education students were reviewed.As a result of this literature review and the three dimensions to measure COIL experiences were determined, this initial proposal of the scale had three dimensions, after the decision and the definition of the constructs, we proceeded to the design and adaptation of the items that would make up the scale.For cultural sensitivity, the internal review of the items by teachers and students who have participated in COIL experiences was taken into account, in order to determine that the items were clear to the context; after this internal review, the first version was prepared to proceed to content validation by expert judgment.The Questionnaire on the perception of the student's role in the Collaborative International Online Learning Experiences (COIL) can be consulted at: https://doi.org/10.6084/m9.figshare.26210861.v1.

. Participants
For the selection of the members of the expert judgment panel, a search was carried out in Scopus for authors on the subject of COIL, the inclusion criteria being that the authors had at least 3 articles published on the subject as first author or corresponding author.After the selection of the candidates, an e-mail was sent to them inviting them to participate, for which 45 invitations were sent, however, only 18 experts participated.
The sample comprised 18 experts who validated the content of the previously designed questionnaire.In total, 11 women and seven men participated, with 12 holding a doctoral degree.The experts came from institutions in more than six countries, with the highest representation from the United States, Spain, Mexico, and Colombia.Their professional experience ranged from 6 to 32 years (see Table 1).The item has some relevance, but another item may be assessing the same The item is relatively important.The item is highly relevant and should be included

Sufficiency
The items are insufficient to measure the construct The items measure some aspect of the construct The number of items should be increased to assess the construct fully The items are sufficient to measure the construct

. Procedure
For the validation of the content, the Delphi method was used, which is a technique to achieve consensus among a group of people who are part of a panel of experts.This method organizes the communication process of a panel of experts focused on evaluating a problem in several rounds, in this case evaluating the design of a measurement instrument (García-Valdés and Suárez-Marín, 2013;López-Gómez, 2018).The intention of the Delphi methodology is to collate individual opinions until they reach a statistically generated consensus with collective intelligence (Nasa et al., 2021).In this case, for the validation of the content, evaluation criteria were used through which the experts should give their judgment on the items that made up the measurement scale, aspects such as clarity, coherence, relevance and sufficiency were considered.A matrix was developed which defined the elements to be evaluated and the values to be placed, the values ranged from 1 ("criteria not met") to 4 ("high level") (see Table 2).
The expert judgment process consisted of two stages: the first involved responding to questions related to professional variables, and the second involved evaluating the items comprising the instrument, applying the established evaluation criteria.Lastly, experts provided comments and suggestions for improvement for each item (Escobar-Pérez and Cuervo-Martínez, 2008).For the analyses, the measures of central tendency and normality of the data were taken into account.With respect to the normality of the data, acceptable values of skewness and kurtosis, which were within 2 and −2 standard deviations, were taken as reference (George and Mallery, 2001).Regarding the values obtained with Aiken's V coefficient, values above 0.75 were considered as criteria for permanence (Wilcox and Serang, 2017;Aw, 2019).Items with values below this threshold were considered to need further review.

Results
. The criterion of clarity in the wording of the scale items Table 3 shows the results of the measures of central tendency, normality, and Aiken's V scores for the clarity criterion for each item of the International Online Collaborative Learning Perception Scale (COIL) assessed by the expert judges.The means indicate that experts considered the items to have a moderate level of clarity in their wording, except items 6 and 9, whose means suggested unclear wording.Regarding the distribution of the data, with the exception of items 3, 11, 14, 16 and 18, the rest presented normal values considered acceptable.
Regarding content validity through expert judgment, reference values were used to decide whether items should be retained based on items with an Aiken's V above 0.75.Items that did not meet this criterion should be reassessed for necessary adjustments in their wording.The obtained values ranged between 0.50 and 0.87 points.Eleven of items comprising the scale were considered clear in their wording and could be considered for measuring the construct.However, eight (items 1, 5, 6, 8, 9, 12, 16, 19) had values below the established reference value, indicating poor clarity of wording.Therefore, these items needed to be adjusted and reevaluated by expert judgment.
. The criterion of coherence in the wording of the scale items Table 4 reports the measures of central tendency, normality, and Aiken's V scores for the items' coherence criterion.The scores reflect that the experts considered the items coherent for what they intended to measure.Regarding normal distribution, the items exhibited skewness and kurtosis values within acceptable ranges, except for items 2, 3, 4, 8, 10 and 11, which exceeded acceptable normality values.Concerning content validity, Aiken's V values show that of the 19 items comprising the scale, 17 attained values considered acceptable (>0.75).However, two (items 9 and 17) need to be reviewed for coherence in the construct.
. The criterion of relevance in the wording of the scale items Table 5 reports the measures of central tendency, normality, and Aiken's V scores for the items' relevance criterion.The scores reflect that experts considered the items relevant for the intended measurement.Regarding the distribution of the data, with the exception of items 1, 2, 3,4,5,6,7,8,8,10,11,13,14,16 and 18, the rest presented values of normality considered acceptable.
Concerning content validity, Aiken's V values indicated that 18 of the 19 items comprising the scale were acceptable (>0.75).However, one item (item 9) must be reviewed to assess its relevance for inclusion in the Perception of collaborative online international learning (COIL) experiences scale.
Table 6 reports Aiken's V values for the clarity, coherence, relevance, and sufficiency criteria for the three dimensions comprising the scale and the overall scale.In Dimension 1, which corresponds to the interaction among students in digital environments, it was found that the clarity of item wording and sufficiency of items should be addressed by the authors who designed the scale to attain acceptable values and ensure that these criteria are satisfactorily met.In the second dimension, referring to collaborative work in multicultural teams to achieve objectives, it was found that the clarity criterion for item wording did not meet the minimum required value; however, the coherence, relevance, and sufficiency criteria achieved values above the suggested threshold.The third dimension of the instrument, concerning reflection on differences and similarities among peers during collaboration, met all criteria assessed by the experts, meaning that the items comprising this dimension were clear, coherent, relevant, and sufficient for the intended measurement.Finally, in the overall score of the scale, Aiken's V values indicated that the clarity criterion did not attain the acceptable score, prompting a review to identify which items need to be reformulated.However, in terms of coherence, relevance, and sufficiency, the overall Aiken's V scores had acceptable values.

Discussion
Measuring international online collaborative learning is essential to understanding and improving the effectiveness of these global educational experiences.This study aimed to assess the content validity through the Delphi method of the instrument that measures the perception of Collaborative International Online Learning (COIL) experiences.For this purpose, the criteria of clarity, coherence, relevance and sufficiency were considered.
The results obtained from the evaluation of the criteria of clarity, coherence, relevance and sufficiency of the collaborative online international learning (COIL) instrument showed important aspects to be considered in the design of the measurement scale.In the evaluation of the criterion of clarity by items of the instrument, it was found to be weak, more than half of the items did not achieve acceptable values of Aikeen's V (Ventura-León, 2019).In the criterion of item coherence, values were reported that allow assuming that the items are suitable for measuring the construct, with the exception of two items that need to be evaluated to ensure that this criterion is effectively met and thus guarantee their coherence and inclusion in the measurement scale.Regarding the relevance of the items, Aikeen's V values were considered acceptable, which guarantees that the items designed are relevant to measure the construct, with the exception of only one item that did not meet the suggested values.
These findings show that although the criteria of consistency and relevance of the items can be considered adequate, the clarity of the items should be reevaluated to make the necessary adjustments.One possible cause of the lack of clarity in the items is the absence of a consensus on the definition of COIL.This also affects the operationalization of the dimensions that should be included to measure the construct.Although there are a few studies on this topic, they report different evidences of validity.However, none of these studies consider the dimensions included in this instrument (Deardorff, 2006;Razali et al., 2016;Biasutti and Frate, 2018;Shimizu et al., 2020;Palacios-Núñez et al., 2023).It should be clarified that COIL has been used for different educational purposes, such as fostering motivation, strengthening or developing internationalization and multiculturalism, favoring different competencies, positively impacting learning, or addressing global issues, which makes it difficult to measure.For this research, we start from what has been promoted by the State University of New York (SUNY) and its creators (Rubin and Guth, 2022), based on three elements that this methodology promotes, which are: collaboration, interaction and intercultural exchange.Each of the constructs could be measured separately or some could be added according to interest or purpose, but the value of COIL is based on the convergence of the triad that sustains them.It is key to mention that this type of instrument is necessary for institutions and teachers to be able to measure with certainty the effectiveness of the efforts pursued in this type of COIL experiences.This makes it possible to continue building more robust experiences and to obtain a more holistic understanding of online learning environments and the constructs on which they are based.In the case of measuring a specific construct, it is recommended to use instruments or models designed for this purpose, such as intercultural competence (Deardorff, 2006;Hofhuis et al., 2020) or collaboration and interaction related to learning (Collazos et al., 2007), just to cite a few examples.But if what is intended is to measure the perception of students in a COIL experience in a general way, this instrument is a relevant and useful tool for this purpose.
In addition to the evaluation of the assessment by items, they were also analyzed at the dimension and global score level, in this part of the analysis the sufficiency dimension of the items was included.This made it possible to know which of the three dimensions that make up the instrument: interaction among students in digital environments, collaborative work in multicultural teams to achieve objectives, and reflection on differences and similarities with peers during collaboration present values that imply a restructuring of the scale.The results of the scores in the dimension of interaction between students in digital environments need to be subjected to a second evaluation to determine The Aiken V values of the dimension of collaborative work in multicultural teams for the achievement of objectives indicate that the criterion of clarity should be addressed again to verify those items that do not reach the desired scores and to ensure that the items are precise in their wording; however, the other criteria, coherence, relevance and sufficiency meet acceptable values.Finally, the third dimension, referring to reflection on differences and similarities with peers during collaboration, meets all the criteria evaluated by the experts.This reflects that the items that make up this dimension are clear, coherent, relevant and sufficient to measure the construct.The results obtained by dimensions show that the weakest criterion is clarity, which implies that the items that make up the scale should be reviewed and the wording rethought.
Regarding the overall scale scores, once again, the authors need to address the criterion of clarity in the wording and readjust the items that did not reach the minimum acceptable scores per item, which compromised the scores per dimension and the overall scale.However, a positive aspect is the acceptable values in the other criteria (coherence, relevance and sufficiency), which guarantee the content validity of the instrument measuring collaborative online international learning (COIL) experiences.

Conclusions
In this study, expert judges evaluated the content validity of an instrument designed to measure the perception of International Collaborative Online Learning Experiences (COIL) for three proposed dimensions using the criteria of clarity, coherence, relevance and sufficiency.The need to review the clarity of certain items was identified, as eight of the nineteen items did not reach the suggested values for retention on the scale, implying a possible need for reformulation to improve comprehension.Although some items required adjustments in coherence and relevance, the majority of the scale items demonstrated acceptable values for these criteria, supporting its content validity in measuring perceptions of COIL experiences.Despite the challenges identified, this study provides a valuable tool for future research on international online collaborative learning.

. Limitations and future research
This study contributes to research on collaborative international online learning experiences, specifically in the field of measurement.Therefore, this study can serve as a reference for evaluating students' COIL experiences.However, the results need to be approached with caution due to the inherent limitations of this study.For example, the evaluation of expert judgment was based on the opinion of a specific group, which could limit the generalizability of the results.A larger sample of experts could provide a more robust validation of the instrument.Further rounds among the experts are also needed to achieve consensus on the dimensions assessed.In addition, although this study presents a comprehensive assessment of content validity, it is necessary to evaluate other validity evidence such as construct validity, convergent and divergent validity, assessment of measurement invariance, and latent mean analysis that could provide information about the behavior of the data in the design.measurement model.
For future research, studies should be conducted that address the measurement of collaborative international online learning experiences, focusing on understanding the COIL construct through the proposed dimensions.In addition, it is essential to develop and validate specific instruments that focus on critical aspects of COIL contexts, such as intercultural communication and conflict resolution.Finally, longitudinal research is needed to provide a deeper understanding of the evolution of students' perceptions over time and the influence of COIL on students' academic performance and satisfaction.

FIGURE
FIGUREPerception of COIL experiences.
TABLE Demographic information of the expert judgment participants.
TABLE Reference values for expert judgment evaluation criteria.
TABLE Central tendency, normality, and Aiken's V calculated for item clarity in the Collaborative Online International Learning (COIL) scale.
TABLE Central tendency, normality, and Aiken's V calculated for item coherence in the Collaborative Online International Learning (COIL) scale.

TABLE Aiken '
TABLE Central tendency, normality, and Aiken's V calculated for item relevance in the Collaborative Online International Learning (COIL) scale.Reflection on di erences and similarities with peers during collaboration s V values for the clarity, coherence, relevance, and su ciency criteria reported for the three dimensions comprising the scale and for the overall scale.