Discourse-Level Information Recall in Early and Late Bilinguals: Evidence From Single-Language and Cross-Linguistic Tasks

Bilingualism research indicates that verbal memory skills are sensitive to age of second language (L2) acquisition (AoA). However, most tasks employ disconnected, decontextualized stimuli, undermining ecological validity. Here, we assessed whether AoA impacts the ability to recall information from naturalistic discourse in single-language and cross-linguistic tasks. Twenty-four early and 25 late Chinese-English bilinguals listened to real-life L2 newscasts and orally reproduced their information in English (Task 1) and Chinese (Task 2). Both groups were compared in terms of recalled information (presence and correctness of idea units) and key control measures (e.g., attentional skills, speech rate). Across both tasks, information completeness was higher for early than late bilinguals. This occurred irrespective of attentional speed, speech rate, and additional relevant factors. Such results bridge the gap between classical memory paradigms and ecological designs in bilingualism research, illuminating how particular language profiles shape information processing in daily communicative scenarios.


INTRODUCTION
Age of second language (L2) acquisition (AoA) has been shown to influence various memory processes in bilinguals (Yoo and Kaushanskaya, 2016;Volkovyskaya et al., 2017;Macmillan et al., 2021). However, most studies have employed random or arbitrary sequences of disconnected stimuli, failing to assess whether AoA impacts a critical aspect of daily communication: the ability to recall information from unfolding discourse. To bridge this gap, we compared the performance of early and late bilinguals (EBs and LBs) on two naturalistic recall tasks with low and high processing demands.
Information recall involves evoking recent or distant events and construing them through language (Rubin and Umanath, 2015), as done when we (re)tell a story, a piece of news, or an anecdote. Though typically studied based on lists of disconnected stimulus lists (Raman et al., 2018;Kilecioğlu et al., 2020;Macmillan et al., 2021), this domain can be fruitfully tapped through two naturalistic tasks: singlelanguage recall (SLR) and consecutive interpreting (CI). In SLR tasks, participants are presented with pieces of discourse and asked to recount their contents as exhaustively as possible, in the same language (Hiltunen and Vik, 2017;Prichard and Christman, 2017;Newberry and Bailey, 2019). For its part, CI requires listening to sequences of continuous speech in one language so as to render them into another language after a time period (Hamidi and Pöchhacker, 2007;Choi, 2013;Liu, 2013;Wu, 2013). Both tasks allow for the use of notes to aid information retrieval but they differ in their overarching cognitive demands, with CI proving more stringent than SLR due to the added challenges of cross-linguistic processing (Hiltunen et al., 2016).
Although these two tasks have informed several topics in the bilingualism agenda (Hiltunen and Vik, 2017;Dong et al., 2018;García, 2019;Newberry and Bailey, 2019), no study has examined how AoA impacts on them. Yet, evidence from non-naturalistic studies shows that EBs outperform LBs in cued word (Yoo and Kaushanskaya, 2016;Macmillan et al., 2021) and picture (Volkovyskaya et al., 2017) recall tasks, suggesting that the same may occur in the face of unfolding texts. Moreover, recall of discourse-level information can be influenced by expertise in specific bilingual skills (Hiltunen and Vik, 2017), indicating that this domain can indeed be shaped by subjectlevel variables in this population.
Importantly, to effectively capture the impact of AoA on information recall, key potential confounds need to be addressed. First, given that AoA may correlate with other bilingual-experiencerelated variables that impact information recall (e.g., L2 proficiency, time of exposure; Oh et al., 2019;López, 2021), relevant factors should be accounted for in the group-formation stage, ideally through exhaustive, validated instruments. Moreover, AoA is known to modulate speech rate, a factor that could impinge on information delivery upon testing (Guion et al., 2000;Saito, 2015a). Also, AoA can affect attentional skills (Kapa and Colombo, 2013), which may critically influence text-level processing by favoring concentration on and appraisal of both key and secondary information (Meppelink and Bol, 2015;Sauer and Hope, 2016). Therefore, robust testing of our hypothesis should directly tackle these issues.
Against this background, we examined whether information recall is affected by AoA in text-level tasks with low (SLR) and high (CI) cognitive demands. Our analyses focused on a validated completeness measure (exhaustiveness and precision in information recall). To account for key potential confounds, we ensured that both groups were systematically matched across multiple bilingual-experience variables using a validated tool (Schaeffer et al., 2020), while empirically addressing the role of speech rate and attentional speed as potential modulators of participants' outcomes. Based on previous findings, we hypothesized that, across both tasks, information would be better recalled by EBs than LBs. With this approach, we aim to shed new light on how AoA influences bilinguals' abilities to recall verbal information from naturalistic discourse beyond isolated stimuli.

Participants
The study comprised 49 s-year students (44 female) from a translating and interpreting program at University of Electronic Science and Technology of China, recruited for course credit. They had received around 64 h of consecutive interpreting classes before testing. A power estimation analysis with G*Power 3.1 (Faul et al., 2009) for an ANCOVA (alpha = 0.05, power = 0.80, η p 2 = 0.25) showed that reliable effects could be obtained with 26 participants. Our actual sample size yielded a power of 0.97. All participants were native Chinese speakers who learned English through formal education or private English tutors. Their ages ranged from 20 to 26 years. They were right-handed and none of them had neurological or psychiatric antecedents.
Following a well-established strategy for group formation in bilingualism research (Bartolotti et al., 2011;Vukovic, 2013;Bialystok and Shorbagi, 2021), participants were separated into EBs and LBs based on the median AoA of the whole sample. Crucially, such median value (namely, age 7) is a typical cut-off reported in the literature (Sabourin et al., 2014;Delcenserie and Genesee, 2017;Kousaie et al., 2017;Claussenius-Kalman et al., 2020). Moreover, the resulting groups are well balanced (24 EBs and 25 LBs) and strictly matched for critical sociodemographic, cognitive, and language profile factors ( Table 1). Specifically, data acquired before the experimental session through TICQ 1 (Schaeffer et al., 2020) showed that the two groups differed significantly in their age of L2 acquisition but were matched in terms of age, sex, language competence, interpreting competence, weekly interpreting practice, and key cognitive dimensions -except for attentional speed, which was entered as covariate in all analyses ( Table 1). The tasks used in the cognitive assessment protocol are detailed in Supplementary Material. 1 Note that, unlike other language competence tools, the TICQ has been statistically validated across different languages. Also, it captures important factors which are absent in other instruments (e.g., weekly dedication to CI and translation in two directions, competence in CI and translation in two directions).

Experimental Procedure
Participants performed two tasks in a fixed order: (i) an auditory L2 SLR task, tapping on text comprehension and memory (Bernhardt, 1991;Vander Beken et al., 2020); and (ii) an L2-L1 CI task, capturing interlingual reformulation skills (De Groot and Comijs, 1995;Christoffels et al., 2013). In both cases, they received oral instructions in Chinese (their L1). They sat at desks in a dimly illuminated language lab, with a desktop in front and with no distractions. Stimuli for both tasks were presented binaurally via a headset with stereo headphones. Recordings were presented only once. The protocol lasted roughly 20 min.

SLR Task
The auditory SLR recall task was based on a news report about hygiene efforts during the COVID-19 pandemic in Africa. The report was delivered by a female speaker of American English. The audio clip was downloaded from the Voice of America website 2 and saved in high-quality mp3 stereo format with 44.1 kHz. It lasted 153 s, with 352 words produced at a rate of 2.27 per second. Participants were instructed orally to listen to the recording and then verbally reproduce as much information as they could, in their L2, with all details they could remember. Each participant adjusted the volume to his/her comfort before the test begun. They were allowed to take down any notes they believed necessary with paper and pencil. As in previous studies (Çakmak and Erçetin, 2018;Vander Beken and Brysbaert, 2018;Vander Beken et al., 2020), they were asked to focus on capturing the text's ideas in their own words rather than provide verbatim renditions. The participants' speech was recorded through their headset's built-in microphone and saved as high-quality mp3 files on Xima 3,100 Digital Integrated Language Teaching System V2.0.

CI Task
The CI task was based on a news report about a virtual reality spa experience, narrated by a woman in American English. The audio file was also downloaded from the Voice of America website, 3 and it had the same specifications as the one used in the previous task. The speech lasted 176 s and it involved 295 words delivered at a rate of 2.54 per second.
Participants were instructed to listen to the recording and interpret it consecutively into L1. To emulate real-life performance conditions, the recording was stopped twice, with each segment lasting roughly 60 s (pauses were made at the same portion of the recording for all participants). Participants were asked to interpret into L1 once the recording stopped, and they were allowed to take down notes with paper and pencil, at will. Their production was recorded exactly as described for the previous task.

Source-Text Description
In line with reported procedures (Vander Beken and Brysbaert, 2018;Vander Beken et al., 2020), the source texts in each task Based on Posner's attention task (Posner, 1980). c Based on validated procedures, not a part of TICQ (Schaeffer et al., 2020 were first divided into idea units, namely, utterances (typically, phrase-sized constructions) that express a complete idea and contain an actual or tacit verb (Mills et al., 1993;Schiefele and Krapp, 1996;Roediger and Karpicke, 2006;Vander Beken et al., 2020). The number of information units and words, as well as the recordings' duration and speech rates, are shown in Table 2.

Speech Transcription
Recordings were first automatically transcribed via iFlytek, 4 a software providing 97.5% accuracy in English recognition (Li et al., 2015;Wang et al., 2018). Three individual copy-editors then checked each transcribed file against its recording to ensure optimal quality. The minimal instances requiring editing were acted on by consensus from all three copy-editors. Transcriptions were saved as doc files for further processing and analysis.

Information Completeness
This measure captures coarse-grained processes encompassing listening comprehension, higher-order cognitive processing, memory retention, and linguistic reformulation skills (Craik and Lockhart, 1972;Çakmak and Erçetin, 2018;Vander Beken et al., 2020). We first followed standard procedures (Mills et al., 1993) to identify idea units in each naturalistic text. The protocol adopted relies heavily on verbs as the central point in an idea unit. Such an approach captures a key point made in leading linguistic theories (e.g., Systemic Functional Grammar) that frame verbs (and their associated processes) as the key organizing element in the lexico-grammatical and semantic structure of a clause (Halliday, 1994;Halliday and Matthiessen, 2014). We then followed validated scoring protocols to rate the presence and correctness of idea units in each transcription, without focusing on any specific lexical class or construction type (Roediger and Karpicke, 2006;Vander Beken and Brysbaert, 2018;Vander Beken et al., 2020). Each unit was given 1 point if correctly recalled, 0.5 points if partially recalled, and 0 points if incorrectly recalled or omitted. Such a rating of information completeness does not focus on any particular lexical class. The proportion of correctly recalled information was then calculated over the total number of idea units. As in previous work (Roediger and Karpicke, 2006;Vander Beken and Brysbaert, 2018;Vander Beken et al., 2020), two independent raters scored all recall protocols independently, and a third rater resolved all discrepancies to reach agreement. Inter-rater agreement reached 96% of units for the SLR task and 94% for the CI task. The remaining 4

Control Measures Speech Rate
Speech rate is a fluency measure calculated as the number of words spoken in a minute (Hulme et al., 1984;Trofimovich and Baker, 2006;Polyanskaya et al., 2017). The number of words of each transcript was counted in Microsoft Word and the length of each recording was obtained from the corresponding file's property log. The same procedure was used in both tasks.

Additional Measures for the CI Task
We also considered two CI quality measures (Hamidi and Pöchhacker, 2007). In both cases, scores were provided by two independent annotators. The minimal discrepancies that emerged were resolved by a third annotator leading to consensual values. All measures were calculated for each text segment.
First, we measured delivery by quantifying pauses and disfluencies. Following Hamidi and Pöchhacker (2007), we first calculated the mean frequency of filled pause (interjections such as "um, " "uh, " "hmm" or fillers between utterances; Brennan and Williams, 1995), and unfilled pauses (a pause not "filled" by a hesitation form) in each recording. All annotations were marked with Adobe Audition (version 13). Second, in each recording, we calculated the mean frequency of three disfluency metrics: (a) false starts (interruption of a sentence followed by another complete sentence with a change in meaning), (b) repetitions (unwarranted reiteration of a word or a phrase, usually after a pause), and (c) slips of the tongue (deviations from the intended form of an utterance).
Second, we measured quality of expression in terms of grammatical, syntactic, and lexical errors (Hamidi and Pöchhacker, 2007). In line with previous research (Levenston, 1979;Ting et al., 2010), we targeted the following variables: (a) misformation (use of wrong word forms or structures), (b) wrong sentence structure [lack or misuse of a subject and/ or a finite (+tense) verb, and/or an independent clause], and (c) wrong word selection (non-native-like word combinations).

Statistical Analysis
Information completeness and speech rate were compared between groups in each task via independent measures ANOVAs. Also, given that EBs and LBs differed in attentional speed (cued reaction time from Posner's task; see Table 1), the contrasts yielding significant differences were reanalyzed via ANCOVAs, including attentional speed outcomes as a covariate, to test whether the effect was driven by the latter factor. As in previous works on SLR and CI (Khateb et al., 2016;Çakmak and Erçetin, 2018;Jost et al., 2018;Vander Beken et al., 2020), for each measure, an outlier detection threshold was set at 3 SDs away from the sample's mean. No participants were excluded as outliers based on these criteria. Effect sizes were calculated via partial eta squared (η p 2 ) for ANCOVAs, with standard benchmarks to discriminate among small (η p 2 = 0.01), medium (η p 2 = 0.06), and large (η p 2 = 0.14) effects (Cohen, 1988). Effect sizes for pairwise comparisons were obtained through Cohen's d. These analyses were run on IBM's SPSS Statistics (v.26). Also, to further explore the role of attentional differences in our main analyses, we implemented a mediation model per task to examine whether attentional speed mediated the link between AoA and information recall. The mediation analysis provides a quantification of the causal pathways of one or more measurements called mediating variables (Schoemann et al., 2017). Alpha levels were set at p < 0.05. Mediation analyses were performed on Jamovi (2020), v. 1.2. All experimental data are fully available online (Chou, 2021).

Correlation Between SLR and CI Outcomes
In an exploratory analysis, we examined whether SLR and CI outcomes were associated in each group. Given that data was normally distributed in both cases, we performed Pearson's correlations. We observed a significant positive correlation for EBs (p < 0.001; r = 0.749) and a non-significant correlation for LBs (p = 0.586; r = 0.114).

DISCUSSION
This study explored the impact of AoA on two naturalistic information recall tasks: SLR and CI. In both cases, information completeness was higher for EBs than LBs, irrespective of attentional speed. No significant differences emerged between groups in any of the control measures. These findings are discussed below.
In SLR, EBs recalled more information than LBs. This aligns with previous studies showing that AoA is associated with information encoding and retrieval in cued word (Yoo and Kaushanskaya, 2016;Macmillan et al., 2021) and picture (Volkovyskaya et al., 2017) recall tasks. Crucially, our study

A B
FIGURE 1 | Information recall in naturalistic discourse-level tasks. Early bilinguals correctly recalled more L2 discourse-level information than late bilinguals when tested in L2 (A) and in L1 (B). These effects were uninfluenced by attentional speed. The asterisks ( ** ) indicate significant differences at p < 0.01 for ANCOVA results, covarying for attentional speed.
Frontiers in Psychology | www.frontiersin.org 6 October 2021 | Volume 12 | Article 757351 extends these findings, suggesting that AoA can also modulate information recall in the face of ecological textual materials. Tentatively, this effect could be influenced by EBs' greater availability of cognitive resources to meet task-related comprehension and memory demands (Akhtar and Menjivar, 2012). Indeed, lower AoA has been linked to reduced effort for concomitant linguistic (e.g., morphosyntactic; Ullman, 2001) and executive (Akhtar and Menjivar, 2012) processes. This, we propose, might free cognitive resources to meet information retrieval requisites during SLR. The advantage of EBs was also significant in CI. This task is more demanding than SLR, as information must be first processed in L2, retained for a brief period, and then retrieved and encoded into L1 (Napier, 2015;Liang et al., 2019). Whereas some AoA effects tend to attenuate or disappear as task complexity increases (Marful et al., 2016;Catling and Elsherif, 2020), this result suggests that low AoA boosts information recall even under stringent processing conditions. Of note, better performance for EBs than LBs has been reported in other studies involving tasks with different complexity, such as forward digit recall, passage completion test, and non-word repetition (Delcenserie and Genesee, 2017). In this sense, our study is the first to show that such demand-independent advantages for EBs also manifest for information recall in natural discourse.
Moreover, as shown in Table 1, both groups were systematically matched across numerous variables that could impact on information recall skills, such as language competence (Schweppe et al., 2015) and working memory (Loaiza et al., 2011;Ljung et al., 2013). In particular, such matching was achieved through a recent tool shown to possess high validity to discriminate bilingual individuals based on multiple aspects of their linguistic profile (Schaeffer et al., 2020). In this sense, AoA seems to modulate text-level information recall not only irrespective of task demands, but also regardless of relevant executive and linguistic variables as well as the participants' broad bilingual and cognitive profiles. This would suggest that AoA has a direct impact on information recall outcomes. Potentially, this might be due to a sum effect of its experiential implications, such as more years of L2 exposure in time windows sensitive to incidental development of linguistic skills (Paradis, 2009). In this sense, our findings pave the way for a new agenda to elucidate the determinants of the observed effect.
Interestingly, EBs also presented enhanced attentional skills. However, ANCOVA results showed that their advantages over LBs in both tasks were uninfluenced by such a factor. Although, attentional capacity may impact information recall in single-item tasks (Sauer and Hope, 2016;Unsworth and Miller, 2021), our findings indicate that, in the face of discourse-level material, the overall effect of AoA on recall may supersede that of attentional skills proper. Indeed, such a finding is in line with evidence that linguistic processing differences between specific bilingual groups are not driven by cognitive control differences (García, 2014;Santilli et al., 2019). This was corroborated by mediation analyses, showing that attentional skills did not mediate the effect of AoA on information recall in either task. Indeed, previous research shows that linguistic processing advantages related to individual bilingual experiences may emerge irrespective of executive outcomes (Archila-Suerte et al., 2015;Santilli et al., 2019). In addition, better information recall for EBs was observed in the absence of speech rate differences, further suggesting that their higher scores were not driven by linguistic productivity.
Further attesting to differences between both groups, SLR and CI outcomes were significantly (and positively) correlated in EBs, but not in LBs. This suggests that individual information recall skills in the former group operate in a task-independent manner, whereas such abilities seem to be differently deployed depending on task difficulty in LBs. Future studies could illuminate this issue with protocols designed specifically to capture cross-task variability in each population.
Of note, participants' performance was rather low in both conditions, with EBs approaching 30% accuracy. Though somewhat unexpected, this is in keeping with the heterogeneous outcomes reported in the literature, ranging from <45% (Vander Beken and Brysbaert, 2018) to >65% (Hiltunen and Vik, 2017). For example, using L1 and L2 SLR tasks, Vander Beken and Brysbaert (2018) reported scores of 44.1 and 56.3%, respectively. Our samples' lower performance level probably reflects our protocol's demands. Whereas Vander Beken and Brysbaert (2018) used written texts and asked participants to summarize them in detail for writing-based recall, the present study involved longer auditory texts with on-the-fly note-taking and ulterior oral recall. Given that oral L2 tasks tend to reduce performance relative to their written counterparts (Kim, 2015;Vandergrift and Baker, 2015).
Finally, our findings carry theoretical implications. First, whereas multiple studies have revealed AoA effects in highly constrained tasks, ours shows that earlier L2 acquisition can impact text-level processing. In this sense, our study meets recent calls for more ecological approaches to the study of language, in general, and bilingualism, in particular (Trevisan et al., 2017;Adams et al., 2018;García et al., 2018;Trevisan and García, 2019;Birba et al., 2020a,b), leading to more situated accounts of the phenomenon. Second, our results reinforce the view that discourse-level recalling is influenced by specific bilingual experiences. Hiltunen and Vik (2017) found that prose recall was better in bilinguals with sustained experience in simultaneous interpreting relative to non-interpreter bilinguals. The present study indicates that AoA may be yet another subject-level variable shaping this domain among the heterogeneous bilingual population. Third, our results show that the impact of AoA per se seems strong enough so as to become uninfluenced by other variables known to impact task performance, such as speech rate and attentional skills. This corroborates the view that AoA may represent one of the most important variables accounting for cognitive variability across bilingual individuals (Birdsong, 2018;DeLuca et al., 2020). Briefly, these considerations can inform and extend current accounts of how different bilingual profiles impact cognitive skills.

LIMITATIONS AND AVENUES FOR FURTHER RESEARCH
Our study is not without limitations. First, our sample size was moderate. Although it was supported by power estimation Frontiers in Psychology | www.frontiersin.org results and it proved similar to or larger than those of other relevant studies in the field (Sabourin et al., 2014;Giezen et al., 2015;Yoo and Kaushanskaya, 2016), future research should replicate our experiment with more participants. Second, the current study used only two tasks, with brief texts allowing for no interaction. New studies should assess whether AoA impacts recall in longer and interactive pieces of discourse. Third, given our interest on AoA, the present design was focused how bilinguals recall information from L2 texts in both intra-and inter-linguistic conditions. Looking forward, it would be interesting to explore whether and how information recall is affected by AoA during L1 SLR and L1-to-L2 CI. This would provide insights on potential asymmetrical effects known to modulate other aspects of bilingual processing (Xia and Andrews, 2015;Declerck and Grainger, 2017;Olson, 2017). Fourth, our samples contained a majority of female participants. While this captures the gender distribution of the profession in China (where recruitment was conducted; Han, 2016) and beyond (Hickey, 2019), while favoring comparability with earlier research (Injoque-Ricle et al., 2015;Dong and Zhong, 2017;Hiltunen and Vik, 2017;Ünlü and Şimşek, 2018;Santilli et al., 2019), future works should strive to find more balanced samples. Also, our median-split strategy for group formation yielded a cut-off of 7 to separate EBs and LBs. While other thresholds, such as AoA 6 or 7, may prove relevant, these are impracticable in our sample as they give rise to highly unbalanced groups. New studies should explore whether present results are reproduced using other AoA cut-offs. Fifth, participants' notes were discarded as their analysis was not contemplated in our design. While these data were orthogonal to our hypotheses, future works should systematically collect and analyze these materials as potential mediators of information recall outcomes. Sixth, the use of real-life materials maximized ecological validity but it prevented us from introducing "task" as a within-subject predictor, given that the texts used for SLR and CI differed in many respects. Future works could replicate our study employing validated protocols to create naturalistic texts, which are matched for multiple variables (Trevisan et al., 2017;García et al., 2018;Trevisan and García, 2019;Birba et al., 2020a,b;Moguilner et al., 2021). Seventh, data on AoA were gleaned through self-report assessments. Despite their limitations (e.g., social desirability biases), subjective estimations of bilingual profiles are standard in the field (Ardal et al., 1990;Neville, 1996, 2001;Mahendra and Arkin, 2003;Moreno and Kutas, 2005;Hernandez et al., 2007;Pakulak and Neville, 2011;Waldron and Hernandez, 2013;Berken et al., 2015;Nichols and Joanisse, 2016;Santilli et al., 2019;Vilas et al., 2019), and their outcomes can predict language ability (Marian et al., 2007), reproduce reaction-time results (Langdon et al., 2005), and replicate naming test scores (Gollan et al., 2012). Yet, future works should strive to test our hypotheses with objective AoA measures. Finally, low scores were observed across tasks. In addition to task-specific factors (e.g., text length and difficulty, note-talking modality), this might reflect the stringent criteria of our performance judgment protocol. While these criteria have been reported in previous works (Mills et al., 1993;Roediger and Karpicke, 2006;Vander Beken and Brysbaert, 2018;Vander Beken et al., 2020), maximizing comparability between our findings and relevant antecedents, future studies should consider different exigency thresholds.

CONCLUSION
Our study suggests that a lower AoA entails better abilities to recall information from naturalistic texts, irrespective of task demands and attentional skills. Such results bridge the gap between classical memory paradigms and ecological designs in bilingualism research. Further work along these lines could afford novel insights on how particular language profiles shape information processing in daily communicative settings.

DATA AVAILABILITY STATEMENT
All experimental data are fully available online at: https://osf. io/xbszv/.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
IC: concept and design, data collection, data analysis, and manuscript draft preparation. JH: concept and design and manuscript revision. EM: manuscript revision. AG: analysis of strategy, manuscript draft preparation, and critical revision. All authors contributed to the article and approved the submitted version.