Text Complexity Modulates Cross-Linguistic Sentence Integration in L2 Reading

Cross-linguistic influences (CLI) in first-language (L1) and second-language (L2) reading have been widely demonstrated in experimental paradigms with adults at the word and sentence levels. However, less is known about CLI in adolescents during naturalistic text reading. Through eye-tracking and behavioral measures, this study investigated expository reading in functionally English monolingual and Spanish (L1) - English (L2) bilingual adolescents. In particular, we examined the role of L1 (Spanish) sentence integration skills among the bilingual adolescents when L2 school texts contained challenging syntactic structures, such as complex clauses, elaborated noun phrases, and anaphoric references. Results of generalized multilevel linear regression modeling demonstrated CLI in both offline comprehension and online eye-tracking measures that were modulated by school text characteristics. We found a positive relationship (i.e., facilitation) between L1 sentence integration skills and L2 English text comprehension, especially for passages with greater clause complexity. Similar main, but not modulatory, effects of sentence integration skill were found in online eye-tracking measures. Overall, both language groups appeared to draw upon similar reading component skills to support reading fluency and comprehension when component skills were measured only in English. However, differential patterns of association across languages became evident when those skills were measured in both L1 and L2. Taken together, our findings suggest that bilingual adolescents’ engagement of cross-linguistic resources in expository reading varies dynamically according to both language-specific semantic knowledge and language-general sentence integration skills, and is modulated by text features, such as syntactic complexity.


INTRODUCTION
Reading and comprehending complex connected school texts can be challenging for both firstlanguage (L1) (Cain and Oakhill, 2004;Lervåg et al., 2018) and second-language (L2) (Lesaux, et al., 2006;Melby-Lervåg and Lervåg, 2014) learners. Readers must draw upon linguistic knowledge and skills of different types: orthographic decoding, vocabulary, syntax, and discourse knowledge to rapidly create and update dynamic representations of meaning (Kintsch and Van Dijk, 1978;Perfetti and Stafura, 2014). For bilinguals who are also biliterate, the reading process is further complicated by the presence of two languages, drawing upon component skills that can include more than one orthography, lexicon, and syntactic system. Crosslinguistic influence (CLI) describes the effects that bilinguals' languages may have on each other, even when only one language is the target of communication (Alonso, 2019). Cummins' (1979) early Linguistic Interdependence Hypothesis proposed a common underlying linguistic proficiency that would allow bilinguals to transfer linguistic skills across languages given sufficient L1 proficiency. This hypothesis proposes that in addition to their emerging L2 abilities, young bilinguals draw upon L1 knowledge and skills in developing L2 literacy.
Educational studies of cross-linguistic influence using behavioral measures of oral language broadly defined (e.g., Lesaux et al., 2010;Nakamoto et al., 2012;Proctor et al., 2010;Relyea and Amendum, 2020) have generally found a weak or no contribution of L1 oral language to skilled text reading and comprehension in L2, or have found that L1 contributions are mediated by L2 language skills. In contrast, extensive behavioral research has found that L1 can support L2 text reading through shared component processes, such as phonological awareness (Durgugoglu et al., 1993;Bialystok et al., 2005;Prevoo et al., 2016) and orthographic decoding (Geva and Siegel, 2000;Lindsey et al., 2003;Lesaux and Siegel, 2003;Verhoeven and van Leeuwe, 2009;Melby-Lervåg and Lervåg, 2011;Kremin et al., 2019), particularly when L1 and L2 share writing systems. Similarly, cross-linguistic transfer of L1 vocabulary knowledge, especially as children move from the stages of "learning to read" to "reading to learn," can bolster L2 reading comprehension (e.g., Proctor et al., 2005;Pasquarella et al., 2012;van den Bosch et al., 2020). However, in social contexts such as the United States, where there is often attrition of the minority home language as children's schooling progresses in the societally dominant language (i.e., English), L1 proficiency can also exert a negative influence on L2 comprehension (e.g., Swanson et al., 2008;Kieffer, 2012;Ordóñez et al., 2002). In these contexts of subtractive bilingualism (Cummins, 2000), conditions which promote L2 proficiency may also exacerbate L1 attrition, giving the impression of negative interference as higher levels of L1 are correlated with lower levels of L2 language knowledge and vice versa.
Although CLI in reading has been widely examined at the level of sound and word representations (see Genesee and Geva, 2006;Chung et al., 2019), there has been comparatively little education research on CLI involving the integration of these word representations into sentence-and text-level meanings, a higher-order process needed for skilled reading and comprehension of school texts. While rich lexical representations are important building blocks in reading, comprehension requires more than understanding words in isolation. Readers must create a context or situation model of connected concepts and rapidly integrate new words as they are read (Van Dijk and Kintsch, 1983). The Reading Systems Framework (RSF, Perfetti and Stafura, 2014) is one account of this memory-dependent integration process, which postulates that successful readers rapidly access orthographic and rich lexical representations from written text. These rich lexical representations include morphological and other word features, such as aspect and category, that contribute to "highquality" lexical representations. In order to comprehend a text, readers hold these representations in memory while integrating them into a holistic representation of the sentence (or sentences), called the textbase. Sentence-level integration requires the reader to both compose and decompose meaning beyond the lexical level (e.g., using word order, referential rules, and other sentence-level cues). The present study investigates this integrative processing (illustrated in Figure 1) as a potential locus of L1 linguistic ability that may support L2 reading efficiency and comprehension.
Here, we investigated the nature of word-to-sentence integration using naturalistic reading tasks drawn from education curricula in order to investigate the complex reading behaviors of adolescents who must learn in a developing L2. According to the RSF, linguistic knowledge contributes to the construction of meaning beyond the word level; however, the mechanism by which word-level representations are integrated at the sentence and text level is not precisely specified in the RSF. There are currently two prevalent types of models of sentence integration: memory-based integration models and predictive processing models. On the one hand, memory-based accounts of complex meaning representation have traditionally been formulated around the "bottom-up" processing of separate representations of the lexicon, morphology, and syntax that are held together in working memory to create sentence-level meaning (e.g., Cunnings, 2017). On the other hand, computational and neurobiological accounts of complex meaning representation focus on the role of "top-down" predictive or expectation-based meaning construction (e.g., Levy, 2008). Tightly controlled experimental work can provide rigorous evidence validating or disproving these types of models. However, such work also trades off experimental control and ecological validity and insight into complex behavior in rich discourse contexts (e.g., see Vanderwal et al., 2019 for a discussion of naturalistic stimuli). In order to conduct transdisciplinary educational research, the processing account in the present naturalistic study is grounded in an ecologically relevant approach in which we adopt a top-down perspective whereby different reading component skills are revealed in behavioral and eye movement measures of reading efficiency and comprehension.
Findings are thus observational and seek to extend prior monolingual literature indicating that same-language sentence integration skills are important contributors to reading comprehension, particularly starting in late elementary and middle school (Low and Siegel, 2005;Nation and Snowling, 2000;Lesaux et al., 2006;Geva and Farnia, 2012;Proctor et al., 2012;Jeon and Yamashita, 2014;Gottardo et al., 2018;Babayigit and Shapiro, 2019;Brimo et al., 2017). In particular, little is understood about the cross-linguistic contributions of L1 sentence integration abilities when bilingual individuals are reading in their L2, a common requirement for bilingual learners with L2 English in schools with English as the medium of instruction. The current study focuses on CLI in sentence-level integration, examining how L1 (Spanish) sentence integration abilities during L2 (English) text reading in Spanish-English bilingual adolescents are associated with both online text processing efficiency (assessed by eye-tracking measures) and offline text comprehension (assessed by post-reading comprehension measures) with their functional monolingual peers who spoke and read primarily only in English.

Educational Studies of Cross-Linguistic Influence in Text Comprehension
Although cross-linguistic influence involving the lexicon has been widely investigated, there is inconclusive evidence regarding potential CLI of syntax on L2 reading comprehension. Several behavioral studies involving children educated in L2 Englishdominant environments found no relationship between L1 syntactic skills and L2 reading comprehension. In a study of adolescent Spanish-English bilinguals who were newcomers to the United States with prior schooling in Spanish, Garrison-Fletcher et al. (2019) found that general and academic reading comprehension, but not Spanish syntactic skills, were positive predictors of English text comprehension. Similarly, Swanson et al. (2008) assessed 68 third grade Spanish-English bilinguals with both Spanish and English reading component measures. Although L1 Spanish morphosyntactic knowledge positively predicted English reading comprehension, it did not do so uniquely-this relationship was explained by other English measures. With 123 children in grades 3-5, Leider et al. (2008) found a negative relationship between Spanish syntax and English sentence judgment performance but no relationship with English comprehension measured by cloze or multiplechoice tasks. Kieffer (2012) found a similar negative relationship between 295 kindergarten children's Spanish oral language skills more broadly measured and later English reading comprehension in nationally representative longitudinal data. These three studies were conducted in the United States, where the L2 (English) is the dominant language used almost exclusively in public schools. Only one study with participants in this setting produced a contrasting result. In a longitudinal investigation of syntactic skills among 156 Spanish-English bilingual upper elementary students, Proctor et al. (2017) found that L1 syntax, measured with a sentence formulation task in the second grade, predicted English oral language and reading comprehension in the fifth grade. There is thus mixed evidence of cross-linguistic syntactic transfer in L2 Englishdominant settings at different ages, with study findings variously suggesting interference, positive transfer, or no influence of L1 syntactic skills on L2 text comprehension.
In contrast, behavioral studies conducted in immersive bilingual education settings where bilingual literacy is explicitly instructed have found a consistent positive relationship between L1 syntax and L2 reading comprehension during the preschool to upper elementary years. Gabriele et al. (2009) examined a small sample of Spanish-English bilinguals in a bilingual education preschool program. Preschoolers' performance on native Spanish structures (i.e., more complex structures) predicted better performance on an English reading readiness measure. In unpublished studies, Sohail (2015) tested a cross-sectional sample of emergent Canadian English-French bilingual first graders and their third-grade peers who had completed 2 years of French immersion using a word order correction task. In the first grade, but not in the third grade, L1 English word order correction was positively associated with L2 French reading comprehension. Among bilingual Cantonese-English first through third graders studying a dual-language curriculum in Hong Kong, both cross-sectional (Siu and Ho, 2015) and longitudinal (Siu and Ho, 2020) studies found that L1 Chinese word order and morphosyntactic skills positively predicted L2 English text comprehension measured in the third grade. L2 English word order and morphosyntactic skills mediated both cross-sectional and longitudinal relationships when accounting for oral language and cognitive skills, suggesting that syntactic skills represent a common underlying linguistic proficiency contributing to reading comprehension. Similarly, for Spanish-English bilingual upper elementary students in dual immersion programs, Phillips  found that L2 (English) reading comprehension was positively associated with a composite measure of academic language in the L1 Spanish that included syntactic skills in addition to vocabulary and genre-related knowledge. Although each of these studies investigated different age groups (preschool to upper elementary) and different measures of syntactic skill (word order, morphosyntax, and sentence complexity), they share common findings of cross-linguistic syntactic transfer to L2 reading among children in bilingual immersion education settings. Consistent with Cummins' (1979) hypothesis that readers could draw upon a common underlying linguistic proficiency to support L2 comprehension only when they possessed adequate L1 proficiency, these findings in bilingual education settings suggest that school instruction in both L1 and L2 may support the positive influence of L1 skills on L2 reading, or at least may mitigate variability in L1 proficiency that impedes this cross-linguistic influence.
Particularly in the United States' educational context where English-only schooling has predominated in recent decades, these mixed results from studies investigating CLI may arise from heterogeneity in either or both participant language characteristics and educational environments. For example, United States-centric studies of Spanish-English bilinguals often examine students designated in the public school system as English learners, who are predominantly United States born, but may in fact range from recent immigrants from Spanishspeaking countries with extensive Spanish-language educational experience to second generation heritage speakers with minimal Spanish proficiency (e.g. see Luk and Christodoulu, 2016 for a discussion). The present study, conducted in the United States context, seeks to understand how students for whom Spanish is the first spoken language (L1) may draw upon this L1 to support text processing and comprehension in English, their second spoken language (L2). We considered Spanish as the participant's L1 if it was the earliest language spoken by the participant on a daily basis at home, even for participants who may have been exposed to English at an early age by listening to media, community interactions, or family conversation in the context of the United States where English is the societally dominant language.
Variability found in CLI effects both within L2 Englishdominant school environments and also across L2-dominant compared to immersive bilingual education settings additionally suggest that CLI is more likely to be observed in contexts where both L1 and L2 literacy are systematically taught and that skills with L1 oral vs. written syntax may have different influences on L2 written comprehension. L1 syntactic skills may thus support L2 reading comprehension; however, it is still unclear for whom and under what behavioral measures and environmental conditions this cross-linguistic support may take place.

Eye-Tracking Studies of Cross-Linguistic Influence in Text Processing
Unlike traditional behavioral tasks, eye-tracking allows for a direct, naturalistic, and temporally sensitive measure of the cognitive processes underlying reading behaviors as they unfold (Rayner, 2009). A large body of eye-tracking literature on bilingual sentence processing has found that L1 syntactic knowledge impacts online L2 sentence reading most often by slowing text processing, for example in complex syntactic structures, such as causal connectives (van den Bosch et al., 2018), anaphora (see Godfroid, 2019 for a review), and referential clauses (see Dussias et al., 2010;Rossi et al., 2019 for a review). Eye-tracking studies of connected text reading (though not focused on CLI) have also found differences in L1 and L2 reading behavior, including longer fixations and more saccades in the L2, contributing to longer reading times. These differences can vary according to levels of self-reported L1 and L2 exposure (Whitford and Titone, 2012;Whitford and Titone, 2017) and objective L1 and L2 proficiency (Cop et al., 2015;Whitford and Joanisse, 2018).
In addition to reader characteristics, effects of syntactic complexity of the text have been found in bilingual adults' adaptation to word category combinations (e.g., articleadjective-noun). Snell and Theeuwes (2020) presented a naturalistic narrative text in both Dutch and English and found that higher frequency structures facilitated Dutch-English bilinguals' eye movement reading behavior in both languages. Differences for bilingual adults reading narrative passages in L1 vs. L2 have been found for syntactic structures such as gerunds, participial phrases, and relative clauses (De Groot, 2018). Among monolingual children, word position during expository text reading has been found to impact offline comprehension (de Leeuw et al., 2016). Furthermore, syntactic processing difficulties have been reported among monolingual children for anaphoric structures in short narrative passages (Joseph et al., 2015).
Eye-tracking studies of connected text reading have primarily focused on adults, leaving reading behavior in children relatively less understood. However, prior studies involving both monolinguals (Joseph et al., 2015;Reichle et al., 2013;Whitford and Joanisse, 2018) and bilinguals (Whitford and Joanisse, 2018) have reported age differences in reading performance, with children exhibiting more fixations and regressions, longer fixations, and shorter saccade amplitudes than adults (Whitford and Joanisse, 2018). Although the maturation of oculomotor control appears to be largely complete by puberty, or around 12 years of age, children's eye movement reading behavior only approximates that of adults as their language proficiency and word-reading automaticity develops (Blythe and Joseph, 2011;Reichle et al., 2013). Adolescent middle schoolers in K-12 education contexts thus are likely to have adult-like oculomotor capacities as well as wordlevel decoding skills but are still developing in the language and higher-level complex reading skills needed for efficient text processing.
To our knowledge, studies have yet to investigate the interaction of reader characteristics, syntactic characteristics of text, and reading comprehension using both offline comprehension tasks and eye movement measures of reading in L2 adolescent readers. Kuperman et al. (2018) examined a broad range of reader and text characteristics in native English monolingual university students, and found at the passage level that syntactic complexity of passages predicted several measures of online eye movement behavior, as well as offline reading comprehension. Results for reader characteristics varied depending upon the specific eye movement measure examined, with word reading fluency and nonverbal reasoning emerging as the common predictors of online text processing. Overall in this study, there was little modulation of reader characteristics by syntactic complexity of the text when comparing less complex to more complex passages. Although prior eye-tracking studies have found that syntactic complexity of texts influences eye movement reading behavior in monolingual children (e.g., measured as sentence length in German-speaking children: Tiffin-Richards and Schroeder, 2018) and adults (e.g., English-speaking adults: Kuperman et al., 2018), none have focused on word-to-sentence integration in bilingual adolescents who are experiencing the transitional demand of learning complex academic knowledge through a developing second language. employed highly controlled sentences as stimuli (and thus, may lack ecological validity), has found that both complex syntactic features of the text and L1 proficiency influence L2 text processing and comprehension. Syntactic features prevalent in school texts, such as elaborated noun phrases, lengthy relative clauses, conceptual anaphora, and the use of distinctive connectives, have been found to pose particular challenges for written text comprehension (Scott and Balthazar, 2010;Uccelli et al., 2015a) in students generally and in L2 learners more specifically (Phillips Galloway and Uccelli, 2019). Crosslinguistically, L2 English academic skills and reading comprehension have been found to be positively associated with L1 Spanish academic vocabulary (Lubliner and Hiebert, 2011) and with academic language skills broadly measured as combined lexical, syntactic, and discourse skills Phillips Galloway and Uccelli, 2019). However, how L2 learners process distinctive syntactic features of written school texts is not well understood, and little prior research has examined L2 syntactic processing beyond morphosyntax in written academic discourse comprehension. Concurrently, prior education research involving bilingual adolescents has found that complex syntactic text features impede L2 reading comprehension, as a function of L2 proficiency. However, there is still little work connecting these two bodies of literature and examining how syntactic characteristics of L2 texts modulate the cross-linguistic relationship of individual reader characteristics and the comprehension of naturalistic school-based texts. To address this critical gap, the current study asks: How do complex syntactic text features and individual differences in crosslinguistic sentence integration skills, as well as their interactions, affect 1) online processing and 2) offline comprehension of naturalistic school-based texts in bilingual adolescent readers?
We examined individual differences in sentence integration skills in both the L1 (Spanish) and L2 (English), as well as in word decoding and vocabulary knowledge in both languages in order to control for lexical effects. Based on prior research, we expected that better L1 and L2 sentence integration skills would be associated with higher levels of L2 text comprehension (Lesaux et al., 2006) and with more efficient online L2 reading (Cop et al., 2015;Gollan et al., 2008;Whitford and Titone, 2012;Whitford and Titone, 2015;Whitford and Titone, 2017). We included two adolescent participant groups: functionally monolingual native English speakers, and Spanish (L1) -English (L2) speakers, acquiring English in school.
Three categories of text features were included in the study: phrase complexity, clause complexity, and the degree of anaphoric reference. Based on the RSF model (Perfetti and Stafura, 2014), we expected that these features, which present challenges to sentence integration processes, such as those involved in parsing, sequencing, and combinatorial analysis/ synthesis, would be a potential locus of cross-linguistic influence in which individual differences in L1 sentence integration skills might contribute to L2 text comprehension. Because syntactic complexity presents challenges to sentence integration, readers may draw upon integrative resources such as syntactic representations or processing biases developed in the L1 to cope with these challenges. As previous research has found that complex syntactic text features are associated with reading comprehension difficulties in L2 child and adolescent readers (Uccelli et al., 2015a;Uccelli et al., 2015b;Phillips Galloway and Uccelli, 2019), we expected that phrase complexity, clause complexity, and anaphoric references would all negatively modulate the expected positive effect of cross-linguistic sentence integration skills on offline text comprehension. Based on similar findings in a self-paced L2 reading study in Spanish-English bilingual adolescents which found that words in more syntactically complex passages were read more slowly than words in simpler ones (Kim et al., 2018;Mulder et al., 2020), we expected a negative impact of these complex text features on online text processing. The study's hypotheses were that: 1) greater L1 proficiency in sentence integration would be associated with higher levels of L2 text reading efficiency and comprehension, and 2) syntactic complexity of the text would negatively modulate the expected positive association of L1 syntactic integration skills with L2 text reading efficiency and comprehension.

Participants
Sixty-five typically developing adolescents, aged 11-15 years, participated in the current study. Participants resided in the Boston Metropolitan Area and attended English-instruction schools. In accordance with Institutional Review Board guidelines (Harvard University IRB16-0866), both child assent and parental consent were obtained. Participants were compensated $50 for their time. The final sample included 59 participants; three were excluded due to eye-tracking issues and another three were excluded due to low nonverbal reasoning scores (standard scores <86.6 or > two standard deviations below the sample mean on the Kaufman Brief Intelligence Test-2; KBIT-2 Matrices; Kaufman and Kaufman, 2004), which can influence reading comprehension outcomes (Quinn and Wagner, 2018). A parental version of the Language and Social Background Questionnaire (LSBQ; adapted from Luk and Bialystok, 2013;Anderson et al., 2018) assessed participants' demographic background, language history, and familial language use. All participants demonstrated heterogeneous language backgrounds reflective of minority language speakers found in United States classrooms. Twenty-nine participants were native L1 (English) speakers, with no L2 immersion experience and minimal proficiency in an additional language. Most were enrolled in an introductory foreign language class as part of the standard middle school curriculum, and several of these L1 English speakers had been enrolled in some form of beginning language class sporadically since an early age. Thus, they were all functionally monolingual.
We considered Spanish as the participant's L1 if Spanish was the earliest language spoken on a daily basis at home, even for participants who may have been exposed to English at an early age by listening to media, community interactions, or family conversation. As illustrated in Table 1, thirty participants were identified as L1 Spanish speakers, who spoke Spanish from birth at home and used it 47% of the time, on average. Twenty-two of these participants were from Spanish-speaking countries (Mexico 8, Central America 5, South America 7, Spain 2) and eight had only lived in the United States The age at which participants were first exposed to English on a regular basis was correspondingly heterogeneous, ranging from birth to 14 years of age (m 7.8, sd 3.7); however, all participants spoke Spanish at home and did not start to use English regularly until at least school age, or approximately 5-6 years of age.
As illustrated in Figure 2 showing on the x-axis the difference scores between Spanish and English vocabulary measures as a proxy for language proficiency dominance, the sample displayed a continuous range of variation on multiple and intersecting, but not fully overlapping, dimensions of language experience. For example, one participant who had lived in the United States for 10 years had markedly strong English dominance with a difference between English and Spanish vocabulary scores of over 40 points. This student could thus be considered a heritage speaker with attrition in home language but was nonetheless reported by parents to speak and read Spanish at home approximately 30% of the day and was similar in this regard to other students with English learner designations and with fewer years of residence in the United States. The multiple and at times disparate dimensions of language experience displayed in Figure 2-relative language proficiency, relative language usage, and relative community immersion-are reflective of the ecological variation in minority language speaker backgrounds found in United States classrooms. Also typical of United States classrooms, most of our minority language speaking sample had been initially identified as English learners upon school entry and had exited from this designation at various points in time prior to their study participation. Because initial English learner classification in United States schools is often based upon speaking a home language other than English, this designation encompasses a heterogeneous range of language proficiencies and language dominance as represented in our sample. Figure 2 illustrates this wide distribution of students eligible for English learner services along all the other dimensions of language use, including language dominance, daily Spanish spoken, and years in the United States.
The two language groups were matched on gender (Fisher's exact p 0.42), age (Kruskal-Wallis χ 2 p 0.06), nonverbal IQ (KBIT Matrices, Kaufman and Kaufman, 2004, Kruskal-Wallis χ 2 p 0.07), verbal working memory (Digits Backward, Reynolds and Voress, 2007, Kruskal-Wallis χ 2 p 0.05), and rapid naming (Letter, Number, 2-set, and 3-set subtests; Wolf and Denckla, 2005; Kruskal-Wallis χ 2 p 0.16). These tasks were administered in the participant's preferred language. The two language groups were also matched on timed English word reading (Sight Word Efficiency, Torgesen et al., 2011, Kruskal-Wallis χ 2 p 0.96) and untimed English word reading (Word Identification, Woodcock-Muñoz Language Survey with Normative Update, WMLS-R-NU,  Woodcock et al., 2005) were only administered to L1 Spanish speakers due to L1 English speakers' reported lack of Spanish proficiency and language experience. L1 Spanish speakers' English and Spanish skills were balanced on average, with no significant difference between English and Spanish vocabulary (Wilcoxon signed-rank p 0.76) or cloze (p 0.77) scores. Word identification skills for this group were higher on average in Spanish than English (p < 0.001) although English scores on this subtest remained significantly above the population mean of 100 (p < 0.001). See Table 2 for participants' cognitive and language proficiency characteristics and Supplementary Table SA, for psychometric information on the standardized assessments.

Cloze Integration Task
Cloze tasks are commonly used as measures of word predictability, as well as individual differences in vocabulary, prediction, and syntactic skills. These tasks were originally developed as an indicator of reading comprehension (Taylor, 1953) and are frequently used as such in both L1 (see Collins et al., 2018 for a meta-analysis) and L2 (Tremblay, 2011;Trace, 2020) education research and practice. However, there is ongoing debate regarding the precise skills that cloze tasks assess, with studies of concurrent validity suggesting that these measures do not align well with other forms of reading comprehension assessment, such as post-reading questioning, whether in multiple-choice or open-ended format (Cutting and Scarborough, 2006;Francis et al., 2006;Keenan et al., 2008;Keenan and Meenan, 2014). There is broad consensus in the research literature that the manner in which the task is constructed strongly influences the skills it taps for both L1 (Gellert and Elbro, 2013) and L2 speakers (Alderson, 1979;Kleijn et al., 2019). At least for L1 speakers, multi-sentence thematic contexts tap into global, discourse knowledge (Clark and Kamhi, 2014), while cloze assessments such as the WMLS-R and other Woodcock passage comprehension formats with single-sentence stimuli and more weakly constraining thematic contexts draw more strongly into lexical knowledge (Leider et al., 2013), word familiarity (Cutting and Scarborough, 2006;Francis et al., 2006), and syntactic knowledge (Cutting and Scarborough, 2006;Keenan et al., 2008;Deacon and Kieffer, 2018).
We employed a cloze task to assess participants' sentence integration abilities in each language. We administered the English WMLS-R-NU Passage Comprehension to all participants and the Spanish Comprensión de textos to L1 Spanish speakers only (Woodcock et al., 2005). The Spanish subtest is a parallel and equated form of the WMLS-R English Passage Comprehension subtest (Woodcock et al., 2005). This cloze task asks the participant to read one or two sentences and orally supply a single missing word. The task starts with stimuli consisting of a single, very brief sentence, expanding to two sentences and/or a longer sentence stimulus as the difficulty of the task increases. Thus, the WMLS-R-NU cloze tasks primarily assess word-to-text integration at the sentence level, particularly for developmental levels of language proficiency. All regression analyses of the cloze tasks also included English and Spanish vocabulary and word identification scores to control for the role of lexical knowledge and word familiarity in sentence comprehension (Francis et al., 2006). Age-scaled standard scores were used in analysis.

Experimental Stimuli
Stimuli design paralleled that of Whitford and Joanisse (2018) and consisted of four expository paragraphs (∼100 words each), taken from the Basic Reading Inventory (BRI), 10th ed, a gradeleveled reading inventory commonly used in schools to evaluate reading fluency and comprehension (Johns, 2008). One  Bieber et al., 2015), a screening assessment also widely used in schools (Good and Kaminsky, 2002).
The six factual open-ended comprehension questions were drawn from a set of ten provided in the BRI for each passage and orally administered. Participants responded to questions orally and received dichotomous scores (correct 1; incorrect 0) for each comprehension question following the BRI scoring procedure (minimum score 0 and maximum score 6 per paragraph).
For each paragraph, we obtained lexical features known to influence reading behavior (Clifton et al., 2007): word age of acquisition (AoA), word frequency, word predictability, orthographic and phonological length, and the number of Spanish cognates (Table 3). Word AoA values were derived from Brysbaert and Biemiller (2017) test-based word AoA ratings. Word frequencies were obtained from the SUBTL-EXus corpus (Brysbaert and New, 2009). Orthographic and phonological length, as well as Levenshtein distances for cognates, were calculated using the R (R Core Team, 2013) package stringdist (van der Loo, 2014). Word predictabilities were obtained through a computerized cumulative cloze task (following Whitford and Titone, 2012;Whitford and Titone, 2014;Whitford and Titone, 2017), where a separate sample of adult L1 English speakers (n 30), guessed the words of each paragraph on a word-by-word basis. Accuracy scores were averaged across participants to create a word-level probability of cloze prediction. These word-level features were next averaged over paragraphs to produce mean characteristics for these linguistic units. As seen in Table 3, paragraphs had approximately the same number of words and Spanish cognates. Texts differed only in the average word frequency of their content words (p 0.02) with the lowest log frequency mean in text level 5 and the highest in text level 7. Texts did not differ in average word frequency, predictability, AoA rating, length, proportion of content to function words, or syllable length (all Kruskal-Wallis p > 0.05).

Paragraph Syntactic Complexity
Each paragraph was evaluated using three classes of syntax measures: 1) classic measures that employ length, or number of words, as an indicator of syntactic complexity; 2) word-level syntax measures; and 3) syntax measures beyond the word level (e.g., examining phrases, clauses, sentences, etc.). Most sentences were composed of a single utterance called a T-unit, itself containing only a single clause. Classic syntactic complexity measures were first generated using the L2 Syntactic Complexity Analyzer (L2SCA; Lu, 2010). L2SCA output was reviewed manually. The greatest numeric variability across paragraphs is captured by sentence length (in number of words), verb phrases, coordinate phrases, and complex nominals. See Supplementary Table SB, for more on L2SCA measures. Next, word-level syntax measures were generated using the Tool for the Automatic Analysis of Cohesion 2.0 (TAACO; Crossley et al., 2019), which employs the Stanford Natural Language Parser (NLP). The word-level parses were inspected for errors manually and corrected where needed. TAACO analysis produces part-of-speech (POS) information as well as type-token measures for key function words at the lemma level. Semantic similarity is represented by Word2Vec (Mikolov et al., 2013), a neural network-derived representation of semantic distance among words and phrases. Supplementary Table SC, provides further information on the TAACO lexico-syntactic complexity measures. Finally, syntactic complexity measures beyond the word level were generated using the Tool for the Automatic Analysis of Syntactic Complexity (TAASC; Kyle and Crossley, 2018), which computes complexity indicators using Stanford NLP parses. Unlike the TAACO measures, TAASC provides syntactic complexity indicators that reflect relationships among words in a sentence and across sentences in a paragraph. TAASC indicators of syntactic sophistication (average lemma log frequency and average verb-argument construction log frequency) measure the overlap with lemmas and constructions found in the academic sub-corpus of the Corpus of Contemporary American English (COCA; Davies, 2009, 2010 as cited in Kyle and Crossley, 2018  Phrase elaboration, clause complexity, and anaphoric reference were quantified by conducting a nonlinear principal components analysis (PCA) for each characteristic. The nonlinear PCA for phrase complexity incorporated phrase measures from the parsing tools described above: adjective modifiers, nominal dependents, direct and prepositional object dependents, and prepositional phrases as nominal modifiers. The nonlinear PCA for clause complexity included clause measures: relative clause modifiers, dependent clauses, clausal direct objects, and clausal conjunctions. For anaphoric reference, the variables entered into PCA analysis were pronoun density, pronoun-noun ratio, and demonstratives. The first principal component from each of these PCA analyses was used as a corresponding text characteristic predictor in subsequent linear regression models. For phrase complexity, the first principal component accounted for 91% of total variance, for clause complexity 93%, and for anaphoric relations, 67%. See Supplementary Table SE, for details.

Design and Procedure
After consenting, participants completed the eye-tracking task, where they silently read one practice paragraph and four experimental paragraphs at their own pace. After reading each paragraph, they answered six comprehension questions without being able to refer back to the passage. The order of the experimental paragraphs was counterbalanced across the two language groups in a Latin Square design. Lastly, participants completed the behavioral tasks, which were presented in a fixed order: KBIT-2, RAN/RAS, TOWRE-2, English WMLS-R, and then TOMAL-2 for L1 English speakers and KBIT-2, RAN/RAS, Spanish WMLS-R, TOMAL-2, English WMLS-R, and TOWRE-2 for L1 Spanish speakers. Total participation duration was about 3 hours.

Language of Testing
Instructions for the nonverbal reasoning task (KBIT-2) as well as the full measures of lexical access/naming speed (RAN/RAS) and working memory (TOMAL-2) were administered in the participant's preferred language. Thirteen out of thirty L1 Spanish speakers chose to complete the RAN/RAS in English, and ten out of thirty did so for the TOMAL-2. English language measures (WMLS-R English) were administered in English only, and Spanish language measures (WMLS-R Spanish) were administered in Spanish only.

Eye-Tracking Procedure
Participants binocularly viewed single paragraphs displayed in yellow text (14 pt. Courier New font, double-spaced) on a black background using Experiment Builder (SR Research Ltd., Ontario, Canada). Each paragraph was presented in its entirety on a 1,024 px × 786 px 21-in. screen positioned 70 cm from the participants, who maintained a fixed head position with the aid of a chin-rest. An EyeLink 1,000 desk-mounted eyetracker (SR Research Ltd., Ottawa, Canada) collected right-eye monocular data at a 1000 Hz sampling rate. Calibration was performed before the start of each paragraph using a 9-point grid and repeated as necessary until the average fixation error was less than .5°of visual angle.

Eye-Tracking Data Preprocessing
Trial data were inspected and corrected for vertical drift. Fixations shorter than 80 ms and those outside of word-level interest areas were deleted from the base data. No upper bound was applied to fixation durations; the maximum fixation duration was 3,330 ms (the only duration above 3,000 ms). The next seven largest fixations fell between 2000 and 2,550 ms with all remaining durations shorter than 2000 ms.
Fifteen eye-movement measures were examined based on prior literature (Cop et al., 2015;Rayner, 2012;Whitford and Joanisse, 2018;Whitford and Titone, 2012;Whitford and Titone, 2017). Six of these were early stage, local (word-level) measures, which captured unconscious processing of the text during the first reading of the paragraph, also called the first pass or first run: first fixation duration and gaze duration, first pass mean saccade amplitude, first pass regressions out, and first pass word skipping. Nine late-stage eye-movement measures captured conscious integration of information and included all passes through the text: five at a local (word) level (mean fixation duration, total reading time, fixation count, regressions out, and mean saccade amplitude), and the remaining four at a global (trial) passage level (trial fixation count, saccade count, run count, and total trial time). Of these nine late measures, five (total fixation duration, total reading time, regressions out and run count and mean saccade amplitude) provided insight into late processing in specific, local areas of interest while four (total trial fixation count, saccade count, words skipped and trial duration) were indicators of global, or paragraph-level, text processing (see Table 4 for calculation of these measures).
As illustrated by these related measures, eye-tracking output produces high-dimensional data with resulting analytic challenges. On the one hand, the multiple eye measures provide different insights into the timing of cognitive processing. Particularly for syntactic processing, the cognitive processing may be observed in some measures and not others (Clifton et al., 2007;Rayner, 1998). On the other hand, analysis with multiple, correlated outcomes or predictors augment the likelihood of type I error in eye-tracking analysis (von der Malsburg and Angele, 2017). Data-driven dimensional reduction techniques provide one way to navigate the problems of dimensionality and multiple comparisons (Kuperman et al., 2018). The current study thus employed ordinal principal components analysis (princals in the Gifi R package, Mair et al., 2017) to extract shared variance from eyetracking measures; the first principal component from the analysis then served as the single outcome variable representing reading efficiency in subsequent regression analysis. Lower values indicate more efficient reading performance.

Descriptive Statistics
Inspection of the raw data revealed that in most cases, the distribution of behavioral and eye-tracking data violated assumptions of normality and variance homogeneity, and in regression models, of sphericity and homoscedasticity. This analysis thus utilized non-parametric tests implemented in R (R Core Team, 2018) to compute basic descriptive correlations and first-level group comparisons. Specifically, the analysis employed Wilcoxon (Mann Whitney) signed rank tests (R package coin; Zeileis et al., 2008) and BCa (bias corrected and adjusted) bootstrapped 95% confidence intervals to test differences in sample means (R package boot, Canty, 2002); Kendall's test of association (tau) to examine pairwise correlations among numeric variables of interest (in base R); one-way Kruskal-Wallis ANOVAs to examine associations between numeric and categorical variables of interest (in base R); and repeated measures, robust ANOVA (R package WRS2; Mair and Wilcox, 2016).

Linear Regression Modeling
In order to construct a regression taxonomy for online eyetracking and offline comprehension outcomes separately, the best-fitting distribution of the outcome variable was first determined through visual inspection, substantive alignment and likelihood ratio tests in R package family gamlss (Rigby and Stasinopoulos, 2005). The model taxonomy for analysis comprised a set of multilevel linear regression models using this best-fit outcome distribution with crossed random intercepts for subjects at the paragraph level through R package lme4 (Bates et al., 2015) and R package family gamlss (Stasinopoulos et al., 2017). To examine autocorrelation effects in the eye-tracking data (Baayen et al., 2017), the linear baseline model in each taxonomy was also fitted with cubic polynomial and p-spline smoothers for the behavioral variables of interest using gamlss. In each case, BIC model evaluation indicated that the linear model provided the best fit to the data, hence all taxonomies represent linear mixed effects models.
For the taxonomy examining paragraph comprehension as outcome, model predictors were selected in three steps: first, the a priori selection of age, maternal education, and nonverbal reasoning were entered as control variables based on established relationship between these variables and reading outcomes (Burchinal et al., 2002;Hoff, 2013;Peterson et al., 2017;Auerbach et al., 2019;Sorenson Duncan and Paradis, 2020). Second, the full predictor dataset was reduced through bidirectional stepwise regression minimizing generalized AIC (GAIC). Finally, variables identified in step two were corroborated using ridge regression with 10-fold crossvalidation. Optimal model lambda was identified as the penalization factor yielding the lowest mean-squared error out of a range from 0.1 to 50. Ridge regression was considered to corroborate stepwise regression results if variables with ridge coefficients (i.e., effect sizes) greater than 0.10 were the same as those retained in the stepwise regression.
The model taxonomy examining reading efficiency as outcome included: 1) a baseline model in the full L1 English and L1 Spanish sample with word length, frequency, and predictability as control variables and scaled English behavioral predictors; 2) scaled text characteristic predictors were then added to the full sample model and replicated in 3) the sample of L1 Spanish speakers only. The final eye-tracking model for the L1 Spanish sample included scaled Spanish language behavioral assessments and statistically significant interactions between the Spanish language measures and text syntactic characteristics. Residual plots (residuals vs. fitted values, quantile, and residuals vs. leverage) for all models were examined to ensure that model assumptions were not violated.

Reading Comprehension
Our first hypotheses concerned cross-linguistic influence of component reading processes on offline comprehension of naturalistic English school texts of varying syntactic complexity. We first identified patterns in children's paragraph comprehension and the association of these patterns with specific syntactic characteristics of the paragraphs using multidimensional scaling (Borg et al., 2012). Next, behavioral measures that characterized paragraph comprehension outcomes were jointly plotted with these syntactic characteristics. Finally, regression taxonomies using behavioral predictors modeled paragraph comprehension outcomes.

Descriptive Statistics
On average for each paragraph, participants answered four out of the six open-ended questions correctly (mean accuracy 0.68, sd 0.19). As Table 5 illustrates, scores in the full sample were numerically lower for level 7 and 8 paragraphs than for levels 5 and 6, but scores on level 5 and 6 (post-hoc Hochberg family wize error correction, p 0.58) and on level 7 and 8 (Hochberg p 0.06) paragraphs did not differ significantly from each other. When examined pairwise by paragraph without multiple comparisons correction, accuracy means differed across language groups only for the level 5 paragraph (Kruskal-Wallis χ 2 p 0.03). Variability in scores differed across paragraphs (Mauchly sphericity test p <0.001), but not across groups (Mauchly's p 0.09), and there was no group by paragraph interaction (Mauchly's p 0.77).

Linear Regression Modeling
Because multidimensional scaling is primarily an exploratory method that quantifies and visualizes dissimilarities among object scores, which in this study represented participant's accuracy in responding to paragraph questions, we next fit a taxonomy of regression models to the data in order to examine associations among behavioral predictors and paragraph comprehension in a linear regression framework. The first three models in the regression taxonomy focused solely on English language. The final model additionally incorporated behavioral measures of Spanish language skill in asking   whether cross-linguistic vocabulary knowledge and cloze abilities explained variance in paragraph comprehension over and above English measures. As L1 Spanish and English speakers differed on average in levels of maternal education, the first model in taxonomy one included the control variables of age, nonverbal reasoning (KBIT-2), and maternal education. As displayed in Table 6, Model 1.1, no coefficients for age nor any level of maternal education reached significance (all p > 0.05), while nonverbal reasoning was positively associated with paragraph comprehension (β 0.11, p < 0.001).
Model 1.2 next determined whether paragraph syntactic characteristics and behavioral measures predicted English paragraph comprehension when controlling for nonverbal reasoning, the only significant control variable from Model 1.1. Reading component predictors were word reading fluency, vocabulary and cloze performance, all in English. The three syntactic characteristics of paragraphs in Table 5 taxonomy, phrase and clause complexity, and anaphoric reference were quantified by conducting a nonlinear principal components analysis (PCA) for each characteristic.
As displayed in Model 1.2, L1 Spanish speakers overall (β 0.07, p 0.02) scored higher on average on paragraph responses when controlling for paragraph syntactic characteristics, nonverbal reasoning, English word fluency, English vocabulary, and English cloze. English vocabulary (β 0.08, p 0.003) and cloze (β 0.06, p 0.01) were positively associated with paragraph comprehension. Clause complexity was negatively related to paragraph response accuracy such that paragraphs with higher clause complexity were associated on average with lower accuracy (β −0.10, p < 0.001). Phrase complexity (β 0.001, p 0.96) and anaphoric relations (β −0.003, p 0.82) were not significantly associated with paragraph comprehension.
Because prior research provides conflicting findings on the role of word reading performance in L2 readers, Model 2.1 next examined the Model 1.2 predictors in a reduced sample of only L1 Spanish speakers. Model 2.1 demonstrates that across the subsample and when considering English language performance and cognitive and demographic measures, results mirror those of the full sample, with English vocabulary (β 0.11, p 0.001) and cloze (β 0.07, p 0.02) as the only significant behavioral predictors of accurate paragraph comprehension when controlling for nonverbal reasoning and English word fluency abilities.
Model 2.2 then determined whether Spanish vocabulary and cloze accuracy additionally contributed to explaining variance in English paragraph comprehension for L1 Spanish speakers. When these predictors were entered into a new model that excluded the non-significant syntactic characteristics in Model 2.1, English vocabulary positively predicted paragraph comprehension (β 0.10, p 0.007) just as in the prior model, while Spanish vocabulary was negatively associated with comprehension (β −0.47, p < 0.001), such that higher levels of Spanish vocabulary knowledge predicted lower accuracy on the comprehension questions. In contrast to Spanish vocabulary, however, Spanish cloze accuracy in Model 2.2 indicated a positive relationship, such that stronger Spanish cloze abilities predicted greater paragraph comprehension in L2 English (β 0.26, p 0.009) when controlling for English and Spanish vocabulary levels. Furthermore, as seen in the final model of the taxonomy, Model 2.3, which removes nonsignificant predictors in the interest of parsimony, there was a significant interaction, illustrated in Figure 3, between Spanish cloze scores and clause complexity (β 0.19, p < 0.001), such that higher Spanish cloze scores attenuated the negative association of paragraph clause complexity with paragraph comprehension.

Eye Movement Reading Behavior
Using syntactic elements identified based on the text analysis, we next asked what kind of relationship between eye movement measures, syntax, and L1 and L2 reading skills is observed in adolescents when reading naturalistic school texts in English. This second set of analyses identified patterns in adolescents' eyetracking measures and the association of these patterns with specific syntactic characteristics of the stimulus paragraphs using multidimensional scaling (Borg et al., 2012). Next, behavioral measures that characterized paragraph comprehension outcomes were jointly plotted with these syntactic characteristics. Finally, regression taxonomies using syntactic and behavioral predictors were used to model eye-tracking outcomes.

Descriptive Statistics
Overall, first fixation and mean fixation durations at the word level were comparable for both language groups (Kruskal-Wallis p 0.87). In addition, they did not differ in either first pass (p 0.84) or total (p 0.96) regressions at the word level. However, as Table 7 demonstrates, and consistent with prior findings on L2 reading (Whitford and Joanisse, 2018;Cop et al., 2015), L1 Spanish speakers engaged in significantly more fixations on average at the word (p 0.001) and trial (p 0.002) levels, contributing to a concomitantly longer average trial time (p 0.003). L1 Spanish speakers also skipped fewer words, on average, in both first pass (p 0.008) and total reading (p 0.001), and their saccades were correspondingly shorter in both first pass (p 0.002) and total reading (p 0.002) than those of their L1 English counterparts. The group differences displayed on paragraphs overall also held true for paragraphs when examined separately (all uncorrected Kruskal-Wallis p < 0.05).

Regression Modeling
Given the substantively interrelated nature of these measures and their high correlation, as well as loadings on a single PCA dimension, the eye-tracking measurement regression taxonomies used the first principal components dimension, accounting for 53.4% of variance in the 15 eye-tracking measures described above as the outcome variable. All eye measurement variables loaded on this first dimension such that variables typically positively associated with faster and more proficient reading (e.g., word-skipping, saccade amplitude) loaded with opposite sign to variables typically negatively associated with proficient reading (e.g., gaze durations, regressions, number of fixations), which loaded negatively. Secondary analysis using single eye movement measures (first run gaze duration and total gaze duration) aligned overall with the dimension one regression findings. Lower values of the dimension one measure were therefore indicative of faster and more efficient reading, while higher values indicated slower reading with more fixations and regressions.
Because word characteristics such as length (e.g., Rayner, 2009), frequency (e.g., Rayner and Raney, 1996), and predictability (e.g., Ehrlich and Rayner, 1981) have been widely shown to impact a variety of eye movement measures, Model 3.1 tested the importance of these measures when aggregated at the paragraph level in these experimental stimuli   (Table 7). Behavioral and syntactic predictors for the taxonomy in Table 8 were constructed in the same way as for the preceding analysis in Table 5. In Model 3.1, only average word predictability was significantly associated with reading efficiency (β −0.80, p 0.04); however, variation accounted for by word predictability was colinear with syntactic complexity measures and was therefore not modeled in the remainder of the taxonomy. L1 Spanish speakers, on average, demonstrated reading efficiency outcomes 0.60 of a standard deviation higher than that of L1 English speakers (β 0.60, p < 0.001)    in Model 3.1 when controlling for nonverbal reasoning, English word fluency and English vocabulary. Unlike models in the paragraph comprehension, English word fluency in Model 3.1 was a significant predictor of reading efficiency (β −0.48, p < 0.001). However, English vocabulary was not (β 0.05, p 0.33), and only English word fluency was associated with lower values of reading efficiency (i.e., shorter gaze durations, faster reading times, fewer saccades, and longer saccade amplitudes) when controlling for L1 and for nonverbal reasoning.
Model 3.2 next determined whether the syntactic predictors of paragraph comprehension were also related to reading efficiency. In this model, neither phrase (β −0.04, p 0.45) nor clause (β −0.04, p 0.39) complexity predicted reading efficiency; however, on average in the full sample, anaphoric reference (i.e., the proportion of pronouns and demonstratives, and the ratio of pronouns to nouns) was associated with better reading efficiency (β −0.08, p 0.006). When re-examined in Model 4.1 with L1 Spanish speakers only, results mirrored those from the full sample, with anaphoric reference similarly associated with better reading efficiency in L2 (β −0.10, p 0.01).
As in the paragraph response taxonomy, Model 4.2 next determined whether Spanish language and reading skills additionally contributed to explaining variance in the outcome measure. When Spanish predictors were entered into a new model, the coefficient for Spanish vocabulary was once again positive (β 1.13, p < 0.001) such that a larger Spanish vocabulary predicted worse reading efficiency (i.e., longer fixation durations, longer reading times, shorter saccades and less word skipping). In contrast, Spanish cloze in Model 4.2 displayed a negative coefficient (β −1.20, p < 0.001), indicating that higher standardized scores in Spanish cloze were associated with shorter fixations, faster reading times, longer saccades, and more word skipping. Furthermore, in the final model, Model 4.3 illustrated in Figure 4, there was a trending, but nonsignificant, positive interaction between the anaphoric reference measure and Spanish cloze (β 0.18, p 0.08), such that in passages with more anaphoric references, higher levels of Spanish cloze were trending toward, but not significantly associated with, lower decrements in reading efficiency, as compared to passages with fewer anaphoric references.

DISCUSSION
This study examined the contributions of cross-linguistic sentence integration skills as well as the lexical skills of decoding and vocabulary knowledge to the processing and comprehension of naturalistic English texts in adolescents with either English or Spanish as their L1. In particular, it focused on the modulation of these skills by syntactic structures that have been demonstrated to challenge middle-school readers, namely, complex noun phrases, complex clauses, and anaphoric references. Middle-schoolers who spoke English or Spanish as an L1 read nonfiction passages in English that were leveled from grades 5 to 8 while eye-tracking measures were collected to assess reading efficiency (fluency and speed of reading). Participants also answered post-reading questions about the passages as a measure of comprehension. We hypothesized that greater L1 and L2 proficiency in sentence integration would be associated with higher levels of L2 text comprehension and that phrase complexity, clause complexity, and anaphoric references would all negatively modulate the expected positive effect of cross-linguistic sentence integration skills on offline text comprehension. English sentence integration and vocabulary scores were positive predictors of English text comprehension, and this relationship was similar for both L1 English and L1 Spanish speakers. This result is consistent with prior studies of English learners' reading development (e.g., Lesaux et al., 2007;Lesaux and Harris, 2017) and with most reading comprehension models, including the RSF (Perfetti and Stafura, 2014), which generally highlight the important role of skills in the language of the text in both reading fluency and comprehension.
We expected that L1 sentence integration abilities would support L2 text comprehension, and this was indeed the case when controlling for the effect of Spanish L1 vocabulary. Prior literature has found weak or nonexistent associations of L1 sentence integration with L2 reading among students in L2 dominant school environments but more consistent positive associations among students in bilingual education settings where students receive academic instruction in both languages. Although the L1 Spanish speakers in our sample, except for two in bilingual English-Spanish schools, attended almost entirely English-speaking schools, they still retained relatively balanced or Spanish-dominant language skills when we tested them in middle school. Our biliterate sample was also reported by parents to consistently engage in reading in Spanish at home, with an average of 34% of daily reading reported to take place in Spanish. Thus, in spite of educational settings similar to those found in prior studies with no CLI effects in text reading, our sample possessed proficiency in L1 Spanish reading that may have led to outcomes more similar to students in bilingual education than ones in English-only instruction. If the transfer of a common underlying proficiency, as in Cummins' (1979) proposal, requires some minimal level of L1 abilities, the home language and literacy experiences in our sample appear to have sustained at least a level of Spanish proficiency that supported the transfer of sentence integration skills to English reading.
The positive association of L1 sentence integration skills with reading was not limited to offline, post-reading comprehension but also seen in reading efficiency, where higher levels of Spanish sentence integration were associated with faster and more efficient online text processing. The parallel findings in both online behavior (seen in eye-tracking) and offline behavior (seen in the post-reading comprehension task) suggests that CLI of L1 Spanish sentence integration is not, or is not solely, a post-reading process of reasoning or strategizing about the comprehension questions asked, or re-creating a representation of the text meaning in relation to the comprehension questions asked of students after reading. Instead, these triangulated results suggest that CLI occurs during the reading process as readers are integrating words into larger sentence and text representations. For monolingual models of text processing and comprehension such as the RSF, these findings indicate that linguistic resources beyond the language of the text can support this integration process even when lexical knowledge in that language, such as reflected in our L1 Spanish speakers' English vocabulary scores, may not provide the high-quality lexical representations of meaning that are called for in the RSF to support sentenceand text-level meaning integration. While sentence integration in the RSF model is generally assumed to be a within-language skill supporting word-to-text integration, results from both the behavioral and eye-movement analyses thus indicate that a cross-language sentence integration competency beyond vocabulary knowledge may also support text processing and comprehension in an L2.

RQ2: Does Syntactic Complexity of the Text Modulate the Association of L1 Sentence Integration Skills With L2 Text Reading Efficiency and Comprehension?
Because syntactically complex texts present greater difficulty in both L1 and L2 reading, we had hypothesized that syntactic complexity of the text would modulate the association between L1 sentence integration skills and both text processing and comprehension. Our results demonstrate a significant negative main effect of syntactic complexity on paragraph comprehension. In other words, more complex texts, in terms of clause complexity, were more difficult to comprehend both for the full sample and for L1 Spanish speakers when examined separately. Further, clause complexity positively moderated the relationship of Spanish sentence integration skills with offline comprehension such that the positive association of Spanish sentence integration with L2 comprehension was more pronounced in texts with greater clause complexity. In the least complex passages, there was no evident relationship between L1 sentence integration and paragraph comprehension while in the most complex passages, higher L1 sentence integration skills were associated with better paragraph comprehension. L2 comprehension thus appears to draw more heavily on sentence integration skills developed in the L1 when sentences are more structurally complex and difficult to understand.
In online text processing, we again found that syntactic characteristics of the text were associated with reading efficiency. However, unlike for comprehension, better reading efficiency was predicted by the greater presence of anaphoric references and not by clause complexity. In other words, passages with more anaphoric references were read faster and more efficiently than those with fewer such references. In the study passages, these text features may have provided links among concepts in the text that led to more fluent and efficient online text processing. We also found a trend, but non-significant association for the interaction of anaphoric references and L1 sentence integration skills, raising the question of whether in a larger sample or a longer paradigm, these text characteristics would modulate the facilitative main effect of L1 sentence integration on text processing efficiency.
While the present exploratory study cannot identify mechanisms for CLI in complex text, the RSF was formulated as a bottom-up, memory-based integration model, suggesting that beyond lexical representation, individual differences in short-term and working memory capacities could explain comprehension outcomes. Results of the present study do not support a memory-based explanation of individual differences as our measures of sentence integration in both Spanish and English were written and untimed, and in group comparisons, L1 Spanish and L1 English speakers were matched on verbal working memory and nonverbal reasoning. However, the study findings do suggest that in comprehending text with complex clauses that may be likely to tax memory resources, bilinguals may be drawing upon expectation-based meaning construction skills measured by our sentence integration task. These skills may be specific to features shared by Spanish and English or shared in the academic register. For example, while Spanish and English share a canonical subject-verb-object word order in simple sentences, we might speculate that the relative flexibility of Spanish regarding word order could facilitate comprehension of variation in word order in academic language marked by complex clauses. For Spanish speakers, a bias toward disregarding canonical word order in complex clauses, or toward attending to lexical cues in anaphoric references, could conceivably thus facilitate text processing and comprehension in texts containing anaphors and complex clausal structures. The crosslinguistic influence as well as the modulatory effect of text complexity on CLI demonstrated here may thus be specific to a feature such as word order, or combinatorial similarities in a particular language, Spanish, or pair of languages, Spanish and English. Alternatively, crosslinguistic influence might arise from language-general factors such as general skills with statistical learning, predictive processing, parsing, or inferencing. Future studies that employ more than one task, or more specific tasks, may disentangle these or other possibilities.

CONCLUSION
Our findings extend monolingual reading models such as the RSF by providing evidence at one level, that of word to sentence and text integration, of the cross-linguistic influence of L1 sentence integration skills in L2 reading. Further, the cross-linguistic resources reflected in bilingual text processing and comprehension are dynamic and modulated by syntactic complexity of the text. Bilingual resources in integrating word meanings may thus be sensitive to text-level features not seen in studies of single word and sentence processing. The present study thus also extends word level models in bilingual research which propose a unified model of language processing and extend the implications of lexical models to sentence integration in the comprehension of connected text. Results also suggest that sentence integration may involve unified components beyond the lexicon. Future research may thus investigate how monolingual construction-integration models of reading comprehension, such as the RSF, can incorporate bilingual processing and cross-linguistic influence, not only at the lower levels of phonology, orthography, morphology and the lexicon, but also in integrative processes that are involved in comprehending text features such as the complex clauses and anaphoric references examined in this study.
Because the study included only one global measure of sentence integration, a cloze task, one limitation is that it cannot differentiate among multiple forms of sentence integration that may have contributed to the results, such as word order sequencing, parsing, or referential association. In addition, there was substantial heterogeneity among the L1 Spanish speakers in terms of age of English acquisition and Spanish language ability. This heterogeneity reflects the linguistic and demographic mix in many United States schools and was partially accounted for using mixed effects modeling instead of simple group comparisons. However, it also demands caution in interpreting the effects of predictors that are correlated in the sample, such as English and Spanish vocabulary. Future research might attend to age of acquisition and language skill differences in a larger and longitudinal sample of L2 readers.
In summary, the current study suggests that L1 sentence integration skills can facilitate L2 reading efficiency and comprehension. L1 sentence integration skill in particular appears to support comprehension of complex clauses, even when that text is in an L2. It adds to prior educational research on cross-linguistic influence on reading outcomes using a sample representative of United States public schools, which are largely English-only and do not provide home language literacy instruction. In spite of the English-only educational background of our sample, we found a facilitatory CLI similar to that found in prior studies of children in bilingual immersion education settings, suggesting that when home language skills are developed and maintained outside of the school context, these skills can help support efficiency and comprehension of L2 reading tasks required of children at school. In supplement to bilingual education, practices and policies which support family-or community-based, out-of-school, home language development may thus also support minority language speakers' success in L2 education.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Harvard University Committee on the Use of Human Subjects (IRB16-0866). Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
SL contributed to conceptualization, data collection and curation, designed and conducted the statistical analysis, and wrote the manuscript. VW contributed to conceptualization, data curation, and reviewed and edited the manuscript. LM contributed to conceptualization, data collection, and reviewed and edited the manuscript. GL contributed to conceptualization, funding acquisition, reviewed and edited the manuscript, and supervised the project. All authors contributed to the article and approved the submitted version.

FUNDING
We would like to thank Dr. James Ryan, former Dean of the Harvard Graduate School of Education, as well as the Natural Sciences and Engineering Research Council of Canada (RGPIN-2020-05052), for providing financial support for the study (both awarded to GL). The study was carried out at the Harvard Center for Brain Science and involved the use of instrumentation supported by the NIH Shared Instrumentation Grant Program (S10OD020039).

ACKNOWLEDGMENTS
We would like to express our sincerest gratitude to the children and families who participated in the research. We also thank the dedicated BEE lab members who assisted in recruiting and interacting with the study participants.