Findings From a Two-Year Effectiveness Trial of the Science Notebook in a Universal Design for Learning Environment

This cluster randomized control trial examines the effects of the Science Notebook in a Universal Design for Learning Environment (SNUDLE) on elementary school student science academic achievement and motivation outcomes. Multilevel analyses examined the impact of SNUDLE for all students and important student subgroups. Overall, students who received the SNUDLE intervention had similar motivation and academic achievement in science to their peers who did not receive the SNUDLE intervention. However, relative to students with disabilities in the comparison group, students with disabilities who used SNUDLE scored significantly higher on motivation in science and science academic achievement, with effect sizes (ES) ranging from 0.82 to 1.01. Furthermore, SNUDLE appeared to have a small but statistically significant positive impact on science academic performance among students whose home language is other than English or Spanish with an ES of 0.35. Fidelity of implementation analysis shows sufficient teacher training but fidelity of teacher and student usage of SNUDLE needs to be improved. The qualitative analysis of teacher interviews suggests that teachers perceived benefits of SNUDLE in support language acquisition and science writing skills. Both quantitative and qualitative findings suggest that SNUDLE holds promise for improving academic performance in science and confidence and motivation among some of the most vulnerable student populations.


INTRODUCTION
Research on effective science learning shows that conducting experiments and recording data take up most of the allocated time in today's elementary school science classrooms (Fairbanks, 2013). After an experiment, the teacher may share with the class a quick explanation of the connection with a bigger science concept, or may move on to the next subject. Yet this research also points to the critical importance of building sense-making skills and connections with reallife experiences to improving science comprehension and motivation (National Research Council(NRC), 2011). Likewise, national science standards make building sense-making skills and connections an imperative (Next Generation Science Standards (NGSS), 2018). Participation in these scientific practices builds the habits of mind that drive deeper understanding and motivation toward science learning.
Over the last 10 years in the United States there has been a renewed effort to establish national science standards for teaching and learning (National Research Council, 2012). Amid calls for students to develop an understanding of science beyond rote memorization of facts and procedures, practice-based inquiry approaches have gained popularity in both the classroom and research realms. The Next Generation Science Standards (NGSS) call for a three-dimensional approach to science instruction, around the pillars of 1) science practices, 2) disciplinary core ideas, and 3) crosscutting concepts (Next Generation Science Standards (NGSS), 2018). Beyond the NGSS standards, a long line of experimental evidence in the learning sciences has shown the importance of explanation in developing well-integrated and transferable knowledge in science (ex., Chi et al., 1989;Chi and Wylie, 2014;McNamara, 2017). Yet few teachers are supported to engage their students in authentic sense-making and inquiry (Wee et al., 2007) and thus fall short of realizing the vision of students thinking and acting like scientists.
There are a variety of possible reasons why students spend little classroom time engaging in sense-making. Low teacher confidence and knowledge in science content and pedagogy are significant obstacles (Crawford and Capps, 2016). In addition, science notebooks, a key student tool in scientific inquiry, exhibit construct-irrelevant barriers that make the inquiry and sense-making process less accessible for many students. These barriers hinder students' ability to clearly express understanding of science events and concepts. Writing skills such as spelling, fluency, recording or transcribing of data, and composing text may interfere with expression (Graham and Hebert, 2010). Additionally, there is little experimental research addressing these barriers.

SNUDLE Overview
The Science Notebook in a Universal Design for Learning Environment (SNUDLE) is a digital science notebook created to help students, particularly those with identified learning disabilities, as well as those who are at risk, struggling, and unmotivated, better realize the benefits of science notebooks used in the scientific inquiry process (Figure 1 and Figure 2). Universal Design for Learning (UDL) was chosen as the SNUDLE design framework to minimize constructirrelevant barriers to learning and provide just-in-time supports for active science learning and effective science notebook use (Rose and Meyer, 2002;Meyer et al., 2014). The research literature indicates that science notebooks can be used to support active science learning and the development of scientific literacy (Hargrove and Nesbit, 2003;Klentschy, 2005). However, teachers typically use science notebooks primarily in a mechanical way-to record data, procedures, or definitions-and rarely to support the development of deep understanding through the active science learning process (Baxter, et al., 2001;Ruiz-Primo et al., 2004). Given these challenges, SNUDLE was designed to help teachers use evidence-based tools and strategies to provide all students with access to the general science curriculum and meet the high academic standards set forth in the Next Generation Science Standards (Thurlow and Wiley, 2004).
Like traditional science notebooks, SNUDLE provides students a structured and supported space to collect, organize, and display observations and data in science; space to reflect and make sense of inquiry experiences; and multiple opportunities to demonstrate understanding at every stage of the investigation through text answers and data tables. However, with UDL as the design framework (CAST, 2018) and digital technology as the platform, SNUDLE differs from traditional science notebooks in several key ways.
First, SNUDLE was developed according to accessibility guidelines from the World Wide Web Consortium (W3C Web Accessibility Initiative (WAI), 2018), Section 508 of the Rehabilitation Act (29 U.S.C. 794d), and the National Center on Instructional Media (2006). Text-to-speech technology is built into the notebook interface with realtime highlighting to support simultaneous access to auditory and visual processing, as well as word-by-word English-to-preferred language translation (in this study Spanish, Vietnamese and Arabic language translations were available), keyboard-accessible actions, and a multimedia glossary to provide just-in-time support for vocabulary use and development. These features remove barriers faced by many students with learning disabilities whose literacy skills would interfere with the efficacy of materials that depend on proficiency in reading and writing. They also help support students for whom proficiency in English is a barrier, and others who would more effectively learn through use of builtin accessibility features.
Second, SNUDLE leverages contextual support to develop and reinforce effective science learning behaviors. Pedagogy is built into the interface design itself, guiding students and teachers in the process of active science learning and the effective use of science notebooks. For instance, students are prompted to think about making direct reference to their data and observations in support of their conclusions and to use relevant vocabulary from their inquiry experiences throughout their notebook entries.
Third, in addition to the student interface, SNUDLE contains a teacher interface which includes features that facilitate active science learning. For instance, teachers are prompted and supported to provide feedback on the students' entries that may include corrective information, alternative strategies, information to clarify ideas, or encouragement to engage in the scientific process.

Research Questions
Using a mixed-methods research approach, this study sought to quantitatively and qualitatively address the impact of SNUDLE on student outcomes, and to explore how SNUDLE was used by students and teachers in the classroom. Specific research questions (RQs) include: RQ1 (Overall Impact): Did students who used SNUDLE in fourth grade science classrooms achieve greater gains in science Frontiers in Education | www.frontiersin.org November 2021 | Volume 6 | Article 719672 2 FIGURE 1 | Screenshot of the "Explain" page in SNDULE in which students summarize their analysis of the data and provide their evidence and reasoning as to how the experiment they conducted answers the focus question.
Frontiers in Education | www.frontiersin.org November 2021 | Volume 6 | Article 719672 3 learning and STEM self-efficacy when compared to students who used traditional paper-based science notebooks? RQ2 (Differential Impact): Did the impact of SNUDLE vary among students with disabilities and those whose families speak a language other than English or Spanish? RQ3 (Implementation): Do the usage patterns indicate that SNUDLE was implemented with fidelity by students and teachers? What were the SNUDLE features most commonly used by the different subgroups of students? RQ4 (Perception): What were teachers' perceptions of the usefulness of SNUDLE in science learning, engagement, and selfefficacy?

Quantitative Design
A cluster randomized controlled trial was conducted within a large urban school district from August 2017 to January 2019. There were seven participating elementary schools with a total of 36 participating fourth-grade teachers across two cohorts; the first cohort (August 2017 to January 2018) included 29 teachers and the second cohort (August 2018 to January 2019) included seven teachers. Parental consent forms were distributed on the first day of school for each cohort. While all students who enrolled in the study's full inclusion general education science classes were eligible to participate in the study, we received parental consent and student assent to participate for 683 students (372 intervention, 311 comparison) as of August 2017 for cohort 1 and 219 students (97 intervention, 122 comparison) as of August 2018 for cohort 2, for a total of 902 students (469 intervention and 433 comparison) across both cohorts.
Teacher randomization was conducted after parent consent and student assent were completed. Stratifying by cohort and school, blocking schedule, years of teaching experience, and confidence level in teaching science, the 36 teachers were randomized into SNUDLE or business-as-usual (BAU) condition; 20 teachers were randomized to the SNUDLE condition (16 in cohort 1 and 4 in cohort 2) and 16 teachers were randomized to BAU comparison conditions (13 in cohort 1 and 3 in cohort 2). All teachers taught using the district's StemScopes curriculum, which included a traditional paper science notebook or worksheets. Treatment teachers used the SNUDLE science notebook in lieu of the paper notebook, while condition treatment teachers continued use of traditional paper science notebooks/worksheets.

Qualitative Design
Structured interviews were conducted with both treatment and comparison teachers in both years of the study. A random sample of teachers were selected for interviews. In year one, 10 interviews were conducted with treatment teachers and nine with comparison teachers. At the conclusion of the intervention in year two, eight treatment teachers and five comparison condition teachers were interviewed.

SNUDLE Intervention Teacher Training
Training was conducted on two occasions by CAST staff for both cohorts. In the summer, both intervention and comparison teachers participating in the SNUDLE study received a full-day training during which they were introduced to the purpose and goals of the study and received professional development on the principles of UDL. After randomization, teachers in the intervention condition then received an additional 4 hours of training, FIGURE 2 | Screenshot of the Analyze page in SNUDLE in which students see the focus question and have access to their data. When responding to questions, users have several options for how to respond including writing, speaking, drawing or uploading.
Frontiers in Education | www.frontiersin.org November 2021 | Volume 6 | Article 719672 during which time CAST staff provided them with SNUDLE materials and introduced them to the program and SNUDLE's educational philosophy and approaches to pedagogy. During the training, intervention teachers were provided multiple opportunities to practice using both the teacher-and studentfacing SNUDLE views and role-played how they would use SNUDLE in their classrooms. For instance, with studentfacing SNUDLE, teachers practiced using the multiple modalities of responding, such as speech-to-text, drawing features, and typing or finger-writing responses on the tablet. On the teacher-facing SNUDLE, teachers used the dashboard to view student progress and practiced using the comments features to provide just-in-time feedback on student progress. Upon implementation in their classrooms, intervention teachers received ongoing coaching and support from CAST via weekly newsletters that provided best practice tips and tricks. They also received individualized support when requested and/or when classroom observations suggested the need for additional implementation support and technical assistance. Intervention teachers in both cohorts implemented the intervention from September to January of the school year in which they participated (2017-18 or 2018-19), during which the SNUDLE tablets were integrated into 18 investigations across nine curriculum units. Teachers in both the intervention and comparison group were required to use the district-mandated science curriculum and adhered to the district's pacing guide for administering each lesson in a prescribed timeline. The only difference between the SNUDLE intervention group and the BAU comparison group was the use of the SNUDLE digital science notebook rather than a paper-based notebook when completing the investigations.

Student Demographics
From the student's school records, we obtained sociodemographic data on gender, race/ethnicity, free or reduced-price lunch status, dual language learner status, and language spoken at home. The most common languages spoken at home among participating students were English, Spanish, Vietnamese, and Arabic. Disability status was identified from administrative data indicating the student has an individualized education program (IEP).
Curriculum-Based Unit Tests (i.e., Quiz Scores) Assessment items from STEMscopes, the school district's curriculum, were used as academic achievement measures closely aligned with the curriculum content. The curriculum developers categorized the items by the four levels of Bloom's Taxonomy: Understand, Apply, Analyze, and Evaluate. Because SNUDLE seeks to provide opportunities to improve higher level science thinking, the items we selected predominantly focused on Analyze and Evaluate questions. One of the nine unit tests or quizzes was dropped from analysis because a natural disaster caused school closure at the beginning of the study that interrupted teaching and quiz administration. The sum of correct responses across the remaining eight end-of-unit quizzes served as a proximal outcome measure. The standardized Cronbach coefficient alpha for the STEMscopes unit tests was 0.88.

District Common Assessment in Science
The District Common Assessment (DCA) in Science was used as a pretest measure of academic performance in science and administered to both intervention and comparison students. The school district developed the DCA as a measure of intermediate-term goals and objectives and it is administered at the end of the first and second semesters each school year. The DCA was designed to assess concepts from the Texas Essential Knowledge and Skills (TEKS), with many of its items based on the validated State of Texas Assessment of Academic Readiness (STAAR ® ) program (Human Resources Research Organization, 2016). DCA items that are not directly aligned with TEKS were derived from tests published by several commercial publishing companies.

Measures of Academic Progress
For a broader measure of science knowledge, we administered the Northwest Evaluation Association's MAP test of science at the end of both Year 1 and Year 2 data collections. The MAP science test is a formative measure that covers domains of Earth, life, and physical sciences. It is a computerized adaptive assessment consisting of 50 multiple-choice items with four or five options. In the Northwest Evaluation Association's item development, all items match the assessable sections of a set of academic content standards both in breadth of content and depth of knowledge. MAP tests have been validated to link to content standards in all 50 states and have excellent technical characteristics (Northwest Evaluation Association, 2011). Third-grade state standardized achievement scores in English Language Arts measured by the STAAR program were collected as a baseline measure of academic performance for the fourth grade participating students in this study.

Motivation for Science (MFS)
A key outcome for SNUDLE is its ability to increase not just students' knowledge of science practices, but also their motivation to learn science. The MFS is an 18-item survey intended to measure the latter. The MFS consists of subscales for the following four constructs: self-efficacy, interest, desire for challenge, and comfort using computers. Reliability of the MFS is 0.85; for the experimental sample, it was 0.89 (Rappolt-Schlichtmann et al., 2013). The MFS was administered to all participating and consented students at pretest after teacher randomization, and again at posttest after completion of the intervention.

Implementation Measure
Implementation fidelity was measured for the two components of the SNUDLE intervention: 1) the training and ongoing coaching support provided to the teachers in the SNUDLE intervention group and 2) implementation of SNUDLE by the teacher and Frontiers in Education | www.frontiersin.org November 2021 | Volume 6 | Article 719672 5 students for each of the 15 investigations included in the analysis. For the teacher training, implementation fidelity involved full attendance at two training sessions (total of 8 hours of training) and participation in observation and ongoing coaching from the SNUDLE development team and district-level science specialists trained in SNUDLE. To meet fidelity thresholds, teachers were expected to attend both trainings in their entirety and participate in at least one observation and coaching session with their specialist or SNUDLE researcher.
For documentation of SNUDLE implementation in the classroom, we relied on the SNUDLE software usage data collected while the teacher and students interacted with the SNUDLE online notebook during each of their science lesson's experimental investigations. SNUDLE records user data and provides a dataset describing instances of actions such as logins, which pages and investigations students visit, when and on which pages students create content, which features of SNUDLE students use (e.g., text to speech, language translation), and when teacher users provide written feedback to student users in SNUDLE. From this dataset, we measured implementation fidelity based on the quantity or dosage of SNUDLE use, the quality or depth of use, and the frequency with which accessibility features afforded by SNUDLE were accessed by the students.
Dosage. The study data collection period covered the first semester of the district's science curriculum for each of the two cohorts. Dosage was measured in two ways: 1) Number of investigations accessed in the classroom. Pairs or groups of students used SNUDLE together after logging into a single student's account. Therefore, SNUDLE access during investigations was identified at the classroom rather than individual student level. To ensure that student access was intentional rather than accidental log-ins or used for purposes other than the investigation, at least 15% of students in the classroom were required to log onto SNUDLE at the same time to be identified as SNUDLE used for an investigation; 2) Teacher usage. To measure direct teacher use, we calculated the number of days the teacher logged into SNUDLE.
Quality of usage. There are three different steps students worked through on an investigation in SNUDLE: 1) collect data, 2) analyze data, and 3) explain findings. Each step in the investigation process is a separate webpage of the digital notebook. Accessing the "Analyze data" and "Explain findings" pages suggest a higher level of SNUDLE usage as these pages were developed with the purpose of engaging students' deeper scientific thinking. To measure usage of these pages, we calculated the percentage of days in which students created or edited contents on Analyze or Explain pages out of the total number of days students created or edited content on any of three steps/pages. Accessibility features. In addition to looking at implementation of SNUDLE via dosage and quality of usage, we also observed the backend usage data to understand what types of SNUDLE features were used by the students with and without disabilities. Specifically, we counted how often students used each of a variety of features designed to make the SNUDLE notebook more accessible. These features include things like draw tool, text-to-speech functionality, translation into three different languages, glossary functions, and sentence starters.

Qualitative Teacher Interviews
A twenty-one question interview was designed to address research question four (RQ4) to better understand teacher perceptions of student performance and use of science notebooks. The research team designed a scripted protocol and conducted the teacher interviews conducted by phone at the conclusion of the intervention period each year. Treatment and comparison teachers were asked the same questions, with references to SNUDLE or to the traditional science notebook adapted to match the condition of the interviewee. Some questions were closed-ended, with response options formatted using a Likert-like scale, while other questions solicited openended responses. Each interview required approximately 30-45 min to complete, and teachers and interviewers engaged in dialogue during the process. Interviews were recorded and transcribed to allow for comprehensive analysis.

Quantitative Data Analysis
This study conducted descriptive analysis of baseline and outcome variables for the whole sample, students with disabilities, and students whose home language was a language other than English or Spanish.
Primary estimates of the intervention effect were derived from intent-to-treat (ITT) analyses in which all students remained in the group they were originally assigned to for analyses, regardless of attrition or movement across groups. Regardless of the level of implementation, these analyses compared all students in treatment teachers' classrooms to their peers in comparison teachers' classrooms. Two-level Hierarchical Linear Modeling (HLM; Raudenbush and Bryk, 2002) was performed to estimate the impact of SNUDLE after taking into account students were nested in teachers. Level 1 is the student level and level 2 is the teacher level. Dependent variables were the MAP, DCA, MFS, and total unit quiz score. Independent variables included a constant, a pretest score on the same outcome measure or STAAR (the state standardized reading test) score when pretest on the same outcome measure is not available, demographic characteristics, and treatment indicator. Treatment indicator variable is at level 2. Covariates in this study were derived from the extensive literature on predictors and correlates of students' academic achievement (Hair et al., 2006;Zhai et al., 2011). Specifically, we included the following student-level covariates in the two-level ITT HLM because previous studies have shown that these background characteristics are related to achievement: gender, race, language spoken at home, IEP status, low-income status, and dual language learner status. Y posttest is is posttest scores on MAP, DAC, or quiz; Pretest is the baseline test score; Treatment 1 for intervention teachers and 0 for comparison teachers; COV is is student-level covariates. c is and μ os are individual and teacher random effects. Hedges' g effect sizes for the treatment impact are Frontiers in Education | www.frontiersin.org November 2021 | Volume 6 | Article 719672 6 calculated as dividing the HLM coefficient for the intervention's effect by the pooled treatment and comparison group standard deviation (What Works Clearinghouse, 2017). This study did not impute missing data. Missing data were listwise deleted from the HLM. Treatment impact for subgroup of students was estimated using HLM restricting data to either students with disabilities subgroups or students whose families spoke a language other than English or Spanish.

Qualitative Analysis
The research team adopted a consensus scoring structure in which we used a method of collaborative qualitative analysis involving six phases: 1) preliminary organization and planning, 2) open discussion coding, 3) development of coding labels and structures, 4) initial testing of the coding structure, 5) agreement and finalizing the coding, and 6) reviewing and identifying themes (Richards and Hemphill, 2018).
The coding team used the qualitative analysis software program MAXQDA to code and analyze teacher interview results (VERBI Software, 2019). Initially all interviews were coded for condition. Researchers then identified "effects" by themes which included: 1) motivation, 2) engagement, 3) independence, 4) collaboration, 5) confidence, 6) building understanding, and 7) teacher self efficacy/teaching practices. The third level of coding included treatment teachers' perceptions of student use of supports and scaffolds in SNUDLE. The next coding level identified student subgroups by demographic characteristics: English Language Learners (ELLs), students who are "struggling" (defined by teachers as students who generally had difficulty in reading, math and writing, (generally in the lower quartile of the class), and students with disabilities. Additionally, the research team assigned coded segments a positive, neutral, or negative rating to evaluate where impact of the SNUDLE digital notebook was evident for treatment condition teachers.

Attrition Analysis
Although randomizing teachers and their students to conditions should result in statistically equivalent groups, higher overall level of attrition and differential attrition between treatment and comparison groups may jeopardize the initial balance and impact estimate may be biased (What Works Clearinghouse, 2017). Our data analysis began with an attrition analysis. Across seven outcomes at posttest, treatment group attrition rate ranged from 1 to 16%, comparison group attrition rate ranged from 7 to 15%, and the differential attrition rate ranged from 1 to 6%. According to the WWC standards (2017), the overall and differential attrition rate is low for this study.

Descriptive Analysis
After the attrition analysis, a descriptive analysis was conducted for SNUDLE students and comparison students. Table 1 presents the student background characteristics by condition. Table 2 describes baseline and posttest scores by condition for the whole sample, students with disabilities subsample, or students whose home language was not English or Spanish subsample. Statistical significance of the difference between the SNUDLE and comparison groups at baseline was determined from HLM analysis. SNUDLE participants were not significantly different from comparison students on baseline assessment scores for the whole sample and two subsamples.

Intent-To-Treat Analysis Results (RQ1)
Primary estimates of the SNUDLE impacts were derived from the ITT analyses. Table 3 demonstrates that no significant differences were detected between SNUDLE and comparison fourth-grade students among the overall sample on any academic or motivation outcomes at the end of the 5-month intervention.

Subgroup Analysis Results (RQ2)
Our subgroup analysis showed that the effect of SNUDLE was significant and large among students with disabilities. For example, among the students with disabilities, the SNUDLE group scored significantly higher on motivation in science (Efficacy: ES 0.88, p < 0.05; Interest: ES 0.82, p < 0.05; Desire for challenge: ES 1.01, p < 0.05) and science academic achievement (Total quiz score: ES 0.82, p < 0.01). Furthermore, SNUDLE appeared to have a small but statistically significant positive impact on science academic performance among students whose home language was not English or Spanish (ES 0.35, p < 0.01).

Fidelity of Implementation Descriptive Results (RQ3)
Our implementation analysis includes descriptive analysis of three aspects of implementation: teacher training and ongoing support, dosage and quality of usage by teacher/classroom, and student use of SNUDLE features. First, all SNUDLE teachers across two cohorts attended the two training sessions in their entirety (total of 8 h of training). Additionally, they were observed by and received post-observation coaching from district-level science specialists trained in SNUDLE or the SNUDLE research team at least one time during the research study. Therefore, there was 100% implementation fidelity as pertains to the teacher training and professional development component of the SNUDLE intervention. Second, the SNUDLE usage data confirmed that all SNUDLE teachers implemented the intervention. As shown in Table 4, the dosage descriptive data shows that the mean number of investigations accessed on SNUDLE was 6.53 investigations, which is a little less than half of the total number of investigations SNUDLE offered. The number of days teachers accessed SNUDLE was 9.63, which suggests that teachers did not access SNUDLE for each investigation. We expected higher quality of the SNUDLE usage as presented by using SNUDLE to analyze data and explain findings instead of to collect data, as this indicates that students were more actively engaged with SNUDLE as they created or edited the content of analyze or explain pages. Students in SNUDLE classrooms on average spent 46.65% of the time on analyze or explain pages. Overall, the backend usage data suggest an insufficient level of implementation within SNUDLE classrooms.
Third, when we examine SNUDLE features that were used by students, we found that the top five most frequently used SNUDLE features were the draw tool, sentence starters, data table, glossary, and text-to-speech (Table 5). There was no difference in SNUDLE feature usage by disability status except that students with disabilities used the draw tool functionality less often than their peers without disabilities (t 2.22, p < 0.05). Although we expected students with disabilities might have used SNUDLE features as often as their peers without disabilities, the t-test results indicated that students with disabilities were willing to use and benefit from SNUDLE. This study did not find differences in feature usage by home language except that students whose home language was a language other than English or Spanish used the "set language" functionality more often than their peers whose home language was English (t 2.70, p < 0.01). The "set language" function assigns the language used by the translation feature when translating from English to a target language.

Qualitative Findings (RQ4)
During qualitative data analysis, researchers identified several frequently occurring codes in the interview dataset. These codes were broadly categorized as 1) subgroups of student users, 2) SNUDLE features, and 3) impact on student learning and performance. While teachers in the comparison condition (traditional science notebook) observed students in their classrooms responding positively to use of a science notebook to support the inquiry process, more teachers in the SNUDLE condition described their students as engaged, collaborative, and motivated by science learning. Several teachers using SNUDLE in their classroom attributed these

13.85
The sample size is the number of students who had both pretest and posttest for a particular outcome.
STAAR, state of Texas assessments of academic readiness; MFS, motivation for science.
Frontiers in Education | www.frontiersin.org November 2021 | Volume 6 | Article 719672 8 student behaviors to specific features unique to the digital notebook environment.
Teachers in both conditions across the 2 years consistently noted positive impact from both SNUDLE and traditional science notebooks on students who struggle, including those with disabilities, in contrast to students not identified with learning challenges. Although both SNUDLE and comparison condition teachers noted the positive impact of science notebooks on this subgroup, a greater number of treatment condition teachers (16 out of 17) noted this impact, in comparison to control teachers (8 out of 14). Treatment teachers also observed that students were using sensemaking skills in inquiry science: "SNUDLE guided (students) more (versus) those who struggled with the lab and concept learning without SNUDLE. They had high engagement with SNUDLE. . .now they can connect the lab to the questions and the intent of the lab they did." A smaller but meaningful impact was described for students for whom English is not their primary home language. Many treatment teachers (8 out of 17) noted positive outcomes for these students, whereas fewer comparison teachers (2 out of 14) noted a positive impact. As one teacher using SNUDLE described in their interview,"ESL students are writing more, elaborating more. This is a great celebration." To analyze student learning performance, researchers coded for effects by recurring themes as listed in the methods section. Of the seven themed categories, three emerged most frequently in the teacher interviews: collaboration, engagement, and motivation. These student behaviors were noted during hands-on science investigations and use of science notebooks. Collaboration was characterized as student interactions with peers, engagement as student participation and on-task focus in science, and motivation as maintaining engagement and interest. While both treatment and comparison teachers observed positive impact on collaboration, engagement, and motivation, this impact was reported at a higher rate by treatment condition teachers (see Table 6).
Finally, during interviews, many treatment teachers noted specific features designed to support pedagogy, usability, and accessibility within SNUDLE for their students ( Table 7). While teachers did not speak to all features available in SNUDLE, teachers did identify eight specific features when speaking of impact of the tool on learners. Those features were: uploading images, drawing tool, use of tables, sentence starters, translation, speech-to-text, text-to-speech, and multimedia glossary. Teachers noted benefits to their students most frequently from the multimedia glossary, drawing tool, sentence starters, speech-to text, and text-tospeech (Table 8).
Overall, teachers who used SNUDLE in their classrooms found the tool to be helpful in supporting their students and themselves. Factors such as structure and organization of the digital notebook were noted as helpful for many students because they enabled students to focus on the new material of the experiment, as opposed to deciding what to write or do next. Teachers also appreciated that having science notebook materials online helped students access those records and responses at any time, from anywhere. Science notebook materials were not lost in student desks or left at home. Additionally, teachers found benefits in SNUDLE providing multiple means of action and expression on all prompts and questions: "SNUDLE guides you from beginning to end. It allows variation in how students respond-drawing, typing, speech-to-text-it allowed students to respond in a variety of ways." No need to run Benjamin-Hochberg multiple comparison adjustment because treatment impact was not significant. The HLM controls for pretest, gender, race, free or reduced lunch status, dual language learner status, home language, and IEP status. † p < 0.1,*p < 0.05, **p < 0.01. Number of investigations accessed by more than 15% of students. Teacher usage is measured by the number of days teacher used SNUDLE with any usage data. High level of usage is measured by the percentage of student-days in which students created or edited their content on either the analyze or explain pages (higher-order thinking) over the total days they created or edited their content on collect, analyze, or explain pages.
Frontiers in Education | www.frontiersin.org November 2021 | Volume 6 | Article 719672 9 DISCUSSION Findings indicate that SNUDLE holds promise for improving academic performance in science and confidence and motivation among some of the most vulnerable student populations. SNUDLE was created to support students, especially those with disabilities, in applying and demonstrating understanding during the scientific inquiry process. Previous research on science notebooks suggests that active science learning is particularly challenging for struggling learners who usually have low motivation for science learning (Englert et al., 1988;Graham, 1990;Swanson, 1999). Students with disabilities struggle not only with understanding science concepts, but also with all aspects of active science learning (Rappolt-Schlichtmann, et al., 2013). When compared to students with disabilities in science classes that used traditional paper-based science notebooks, students with disabilities who received the SNUDLE intervention in their science classes had significant positive outcomes in their motivation to learn science and desire for challenge, as well as their ability to demonstrate understanding of science concepts as measured by content area quizzes. Our subgroup analysis for students with disabilities showed that the effect of SNUDLE was significant when analyzing student efficacy (p < 0.05), interest (p < 0.05), desire for challenge (p < 0.05), and science academic achievement based on the total quiz scores (p < 0.01).
Additionally, SNUDLE appeared to have a small but statistically significant positive impact on science academic performance among students whose home language was not English or Spanish (ES 0.35, p < 0.01). These students had higher academic test scores in science when compared to their counterparts in comparison classrooms. It should be noted that the academic gains in these subgroups were based on curriculumbased quizzes and district-administered assessments and did not extend to the MAP science test, a national curriculum-based assessment. Thus, it appears that the benefit of SNUDLE in encouraging scientific inquiry and science performance might be more easily captured by these two science achievement measures that were more directly aligned with the district The outcome is the number of times accessibility features were used by each student. Two independent sample t-test were conducted to test whether there are differences in feature usage by disability status or home language status. *p < 0.05. Teacher perceptions were measured by the number of teachers who referenced these areas of impact in their interviews. These measurements were then filtered to include only positive impact.
Frontiers in Education | www.frontiersin.org November 2021 | Volume 6 | Article 719672 science curriculum than with a more generalizable science assessment. These findings provide promising evidence that use of a UDL science notebook, designed to support students and their teachers during active science learning, improves science achievement and motivation outcomes for students with disabilities and students whose home language is not English or Spanish. Furthermore, by incorporating the principles of UDL, in which a foundational tenant is recognizing and respecting variability in learning and learners (CAST, 2018), SNUDLE

Interview code Teacher comments
Teaching and learning They can't lose their packet. It's (SNUDLE) all there on the iPad. They can log in and change something if they want to.The easiest part was the organization of SNUDLE, from data to analyze to explain. That makes it easier for students, when one small part is finished, it answers, "What do I do next?" SNUDLE makes connections between tasks and instruction. (I see) lots of enthusiasm hey become independent learners, they become peer tutors, and they become scientists Structure Transfer Students used skills learned in SNUDLE in math and reading. There was generalization and transfer of skills from SNUDLE to their other work Teaching practice SNUDLE helped my teaching because it helped me figure out the proper time to ask these deeper questions SNUDLE helps me as teacher try to figure out a road map for the students moving from no idea to conceptual understanding Technology I thought they would decrease interaction by using technology because they were all working on their own devices, but it actually increased their interaction with each other because they were all interacting with the project Now they are focused in on technology and are excited by it. They feel like they have power and control over their learning now that they have the iPad in their hand Collaboration and engagement Students were more collaborative in SNUDLE than when working on their own worksheets or doing the packets they typically would have done if they weren't using SNUDLE. They (students) do more collaboration compared to previous years, because the step-by-step process helped to understand questions and steps. They were able to answer and respond to questions. They like collaborating, and they actually became peer tutors, shared background knowledge and to relate/share science experiences with one another, and builds excitement about what to do. In short, they were more motivated Feature use Multimedia glossary With tablet(s), after a couple of times, even bilingual kids were looking at vocabulary words. (I) challenged them: whatever you write, try to use vocab words. Students liked the challenge/reward (getting stars) when they used vocabulary. Vocab was clearer to them because of the pictures and translations for bilingual students Drawing tool (SNUDLE) had really nice drawings and they could explain what they understood. Not that detailed, but the drawing complimented the idea and I could see what they knew The parts that really excited them was the ability to draw, find pictures, or they could go online and find a picture. It gave students options on what they want to do, which was very helpful for some of them Sentence starters Students who struggled now want to show me, "Look, I chose this stem, and here is why." They have more ownership The sentence starters were some of the biggest benefits. ESL students know how to begin Speech to text (STT) The accommodations that are embedded (STT) help me see in writing what (students) are thinking. A lot of students struggle in spelling. When they speak it (in SNUDLE) they can see the word that they are trying to say. They can verbally express what they think in science but can't connect to the writing Text to speech (TTS) For language acquisition/learning language, the TTS was beneficial. This worked out really well For students who struggle, they are able to be independent. They can hear (text) being spoken to them and don't need the teacher to read to them Tables It's (SNUDLE) already organized for them. The data tables, the worksheet is there. The students don't have to create the data table. They can focus on what they want to put down vs. how to put it down These interview excerpts were selected to be representative of frequently occuring themes from teacher interviews. was designed to overcome construct-irrelevant barriers and provide contextual supports that promote active science learning for all student users (Rappolt-Schlichtmann et al., 2013). While SNUDLE was originally designed with disabilities in mind, its features can be universally leveraged to support active engagement and learning for all students, and was indeed shown to improve academic outcomes among students whose home language was neither English nor Spanish. SNUDLE levels the playing field for students with disabilities and non-Spanishspeaking ELLs. One possible explanation for the improved academic performance among these students is that traditional paperbased science notebooks may have inhibited full understanding and expression of science learning for students below a certain level of literacy proficiency (SWD and those learning English). SNUDLE's embedded accessibility features and scaffolded supports enabled the students to overcome these barriers and focus on the learning at hand. Qualitative data portray the extent to which teachers perceived benefits of using SNUDLE with their students who are learning English. Several teachers described seeing positive effects on learning for these students, particularly in SNUDLE's ability to support language acquisition and science writing skills. In fact, SNUDLE usage data indicate that students whose home language was neither English nor Spanish used the Set Language (language preference) feature more often than their peers whose home language was English. SNUDLE provides flexible, interactive learning spaces and options for all students to demonstrate science understanding, including those whose home language is neither English nor Spanish. SNUDLE allows students to draw on diverse language strengths and resources and to move from a limited learning space to a flexible space in which they can express both science and language learning through multiple means, such as drawing, uploading images, text, and tables (Wilmes and Siry, 2020).
The usage data also revealed that all students, regardless of disability or home language status, used some accessibility features, with the most frequently used features being the drawing feature, sentence starter, glossary, and text-tospeech. Teacher interviews provided insights on the value these features provided to students. As one treatment condition teacher observed, "For students who struggle, the drawing is a bonus, and using the sentence stems is a lifesaver." These features, as well as the UDL-supported Collect, Analyze, and Explain pages, may reduce the effects of barriers to science learning, which might be particularly useful for students with disabilities.
While the study found several positive outcomes for students with disabilities and students whose home language was reported as a language other than English or Spanish, with the most common "other" languages being Vietnamese and Arabic, no measurable impact was detected on the aggregate participant student population.

LIMITATIONS AND FUTURE RESEARCH
These results of the impact of SNUDLE observed among students with disabilities and those whose home language is other than English or Spanish are even more impactful when one considers that they occurred despite the fact that fidelity of implementation did not reach ideal levels for two of the three FOI measures. The lack of implementation is a possible reason for the lack of significant findings in the overall student sample, in contrast to the positive effects reported in the previous study conducted by Rappolt-Schlichtmann et al. (2013). There are several reasons why implementation was not ideal. During the first year of the study, a natural disaster occurred, which impacted the whole district and resulted in challenges to consistent implementation, especially during the first few months of the study. In addition, teachers at times had inconsistent access to the tablets, making it challenging to routinely incorporate SNUDLE into their science lessons as planned. Finally, the study represents first-time use of SNUDLE by teachers, which required them to get up to speed quickly and may not have provided sufficient exposure to achieve mastery.
While the findings from this efficacy study are promising, future studies should consider the potential of using SNUDLE under conditions where it is fully integrated and implemented with no systemic disruptions, such as natural disasters, over multiple years. Replication efforts are needed to substantiate current findings and allow a cumulative synthesis of results. Future research may address the limitations of the current work by collecting individual SNUDLE usage data (instead of the mixture of individual and small group usage data collected by the current study) over time, and clarifying the mechanism behind the impact, particularly focusing on critical features of implementation and how they are associated with student science outcomes. Ideally, future studies should use a randomized controlled trials design (for example, by randomizing teachers to different levels of teacher training and ongoing support, or by randomizing students to groups that use different combinations of SNUDLE critical features) and follow the participants over multiple years. Doing so would allow future studies to examine which implantation features impact which student outcomes and whether the impact of SNUDLE fades or intensifies over time. As mentioned by a previous study (Paek and Fulton, 2021), the impact of a digital notebook is limited by students' ability to use the tool. Future studies should also examine how teachers can enhance student learning by helping them understand the value of each feature and how to use each feature in different science inquiry phases.
Given the identified limitations and challenges to implementation fidelity, the current data also lead to additional questions and rich opportunities for ongoing research regarding teacher supports for SNUDLE. The current SNUDLE implementation included two sessions of teacher training, updates by the research team, and coaching by onsite science specialists. Future studies may address questions such as: What impact would further coaching and supports for teachers have on student outcomes? What impact might a varied menu of training, mentoring, and ongoing coaching have on these outcomes? How could providing teachers with access to data visualizations of student behavior and performance impact teacher data-based decision making for science instruction? We intend to consider these questions in future studies and call upon the field to address these and other questions to support building students' understanding and sense-making skills in science education at the elementary level.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The CAST institutional review board approved this study. Researchers obtained signed consent forms from all adult study participants and signed parent/guardian consent forms and student assent forms for all children in the study.

AUTHOR CONTRIBUTIONS
Study conception and design, JB, JY, and TH. Acquisition of data, KR, TH, JY, and KF. Analysis of Data, XW, JY, AO, TH, and JB. Interpretation of data, JY, KF, AO, TH, and XW. Drafting of manuscript, JY, XW, AO, TH, and KF. All authors contributed to the article and approved the submitted version.

FUNDING
This material is based upon work supported by the U.S. Department of Education's Institute of Education Sciences under Grant R324A160008.