Vast Amounts of Encoded Items Nullify but Do Not Reverse the Effect of Sleep on Declarative Memory

Sleep strengthens memories by repeatedly reactivating associated neuron ensembles. Our studies show that although long-term memory for a medium number of word-pairs (160) benefits from sleep, a large number (320) does not. This suggests an upper limit to the amount of information that has access to sleep-dependent declarative memory consolidation, which is possibly linked to the availability of reactivation opportunities. Due to competing processes of global forgetting that are active during sleep, we hypothesized that even larger amounts of information would enhance the proportion of information that is actively forgotten during sleep. In the present study, we aimed to induce such forgetting by challenging the sleeping brain with vast amounts of to be remembered information. For this, 78 participants learned a very large number of 640 word-pairs interspersed with periods of quiet awake rest over the course of an entire day and then either slept or stayed awake during the night. Recall was tested after another night of regular sleep. Results revealed comparable retention rates between the sleep and wake groups. Although this null-effect can be reconciled with the concept of limited capacities available for sleep-dependent consolidation, it contradicts our hypothesis that sleep would increase forgetting compared to the wake group. Additional exploratory analyses relying on equivalence testing and Bayesian statistics reveal that there is evidence against sleep having a detrimental effect on the retention of declarative memory at high information loads. We argue that forgetting occurs in both wake and sleep states through different mechanisms, i.e., through increased interference and through global synaptic downscaling, respectively. Both of these processes might scale similarly with information load.

Sleep strengthens memories by repeatedly reactivating associated neuron ensembles. Our studies show that although long-term memory for a medium number of word-pairs (160) benefits from sleep, a large number (320) does not. This suggests an upper limit to the amount of information that has access to sleep-dependent declarative memory consolidation, which is possibly linked to the availability of reactivation opportunities. Due to competing processes of global forgetting that are active during sleep, we hypothesized that even larger amounts of information would enhance the proportion of information that is actively forgotten during sleep. In the present study, we aimed to induce such forgetting by challenging the sleeping brain with vast amounts of to be remembered information. For this, 78 participants learned a very large number of 640 word-pairs interspersed with periods of quiet awake rest over the course of an entire day and then either slept or stayed awake during the night. Recall was tested after another night of regular sleep. Results revealed comparable retention rates between the sleep and wake groups. Although this null-effect can be reconciled with the concept of limited capacities available for sleep-dependent consolidation, it contradicts our hypothesis that sleep would increase forgetting compared to the wake group. Additional exploratory analyses relying on equivalence testing and Bayesian statistics reveal that there is evidence against sleep having a detrimental effect on the retention of declarative memory at high information loads. We argue that forgetting occurs in both wake and sleep states through different mechanisms, i.e., through increased interference and through global synaptic downscaling, respectively. Both of these processes might scale similarly with information load.

INTRODUCTION
It is undisputed that sleep is integral to the formation of longterm memory (Diekelmann and Born, 2010;Walker and Stickgold, 2010;Rasch and Born, 2013;Klinzing et al., 2019). Initially, the idea prevailed that sleep predominantly acts as a passive shield against interference from novel information, as put forward by Jenkins and Dallenbach (1924). Even though modern interpretations of this framework still exist, it is now generally accepted that sleep plays an active role for memory (Ellenbogen et al., 2006b), with the two-stage model of memory formation (Marr, 1971;Diekelmann and Born, 2010;Rasch and Born, 2013;Klinzing et al., 2019) being the prevailing model used in declarative memory research. First introduced by Marr (1971) it offers a solution to the "stability-plasticity-dilemma" (Abraham and Robins, 2005), which refers to the problem how a system can learn new information rapidly and in succession without overwriting older memories (Robins, 1995).
Initially, the hippocampus binds together distributed information in the cortex during encoding (Battaglia et al., 2011). During subsequent sleep, the hippocampus repeatedly reactivates these memories in concert with the neocortical representations (Ji and Wilson, 2007;Grosmark and Buzsáki, 2016;Ólafsdóttir et al., 2016;Khodagholy et al., 2017), thereby strengthening and reorganizing the representations in the neocortex (McClelland et al., 1995;Marshall and Born, 2007). Reactivation of memory traces corresponds to sharp-wave/ripple events evident in the hippocampal local field potential recordings during sleep (Diba and Buzsáki, 2007) that coordinate with sleep spindles and sleep slow oscillations to drive active systems consolidation (Clemens et al., 2007;Staresina et al., 2015;Khodagholy et al., 2017). Although, sleep spindle density and reactivation in the form of sharp-wave/ripples have previously been shown to increase as a response to large amounts of learning material (Gais et al., 2002;Mölle et al., 2009), it is plausible that an active process of sleep on memory is limited by the amount of replay that can be accommodated.
In accordance with that, Feld et al. (2016) recently showed that memory consolidation of declarative content during sleep is limited in capacity. Here, participants learned either a short (40), medium (160), or long (320) list of word-pairs. Participants in the medium information load condition showed a large sleep-dependent memory advantage, whereas those in the high information load condition no longer displayed a sleep benefit. This pattern of results can be explained by a capacity limited process of active systems consolidation that leads to local potentiation of memory traces that is accompanied by a more global process of synaptic rescaling, that depotentiates synapses without being limited (Tononi and Cirelli, 2014;Feld and Born, 2017). Extrapolating from this, at even higher information loads the limited capacity for active systems consolidation is surpassed so that sleep would favor forgetting.
To test this, in the present study, we doubled the information load from 320 to 640 word-pairs. We hypothesized that under this increased memory load, sleep leads to more forgetting of word-pairs compared to a wake group.

MATERIALS AND METHODS Preregistration
A rough outline of this study was preregistered at AsPredicted. org. It can be viewed through this link http://aspredicted.org/ blind.php?x=jc2y8t Participants A total of 78 healthy, non-smoking, German-speaking participants performed the complete study (two participants decided to drop out prematurely). They reported a regular wake-sleep cycle, no intake of regular medication (except contraceptives) or illegal substances, and at least the qualification to enter higher education. Beginning on the morning and throughout the experiment, the intake of caffeine-and alcoholcontaining beverages was prohibited. Participants were randomly assigned to either the wake condition (N = 40; 21 female, age mean: 22.9 years, from 18 to 28 years) or sleep condition (N = 38; 19 female, age mean: 22.7 years, from 18 to 29 years). Participants received adequate monetary compensation for their contribution and provided written informed consent prior to the experiment. The study was approved by the local ethics committee (Ethics Committee of the Medical Faculty at the University of Tübingen).

Procedure
Participants arrived at 11:00 h and were seated in a room with four individual workstations that were positioned to minimize distractions from other participants. See Figure 1 for a timeline of the experimental procedure. After a general instruction, participants completed a working memory capacity test (automated operation span task, OSPAN; Unsworth et al., 2005). From 12:00 to 17:00 h, participants learned the 640 word-pairs during a learning phase divided into two parts of 320 word-pairs with a short snack break in between. The snack consisted of a pretzel and one piece of fruit. Participants were asked not to actively rehearse word-pairs during breaks and oral conversation was restricted. At 17:00 h, participants received a standardized lunch consisting of either pizza or pasta. From 17:30 to 22:30 h, the 640 word-pairs were recalled (immediate recall) in two parts of 320 wordpairs each to estimate, how many word-pairs had been successfully encoded. Again, participants received a standardized snack in between the two parts. The snack consisted of two pieces of bread with cheese or salami and a piece of fruit. At the beginning and at the end of each experimental day as well as right before the immediate recall, the psychomotor vigilance task (PVT; Dinges et al., 1997) as well as the Stanford Sleepiness Scale (SSS; Hoddes et al., 1973) were administered (for details, see below). After recall, participants were assigned to either spend the night in the laboratory watching standardized animal documentaries (wake group) or to sleep at home (sleep group). The animal documentaries were various episodes from either The Life of Birds (Salisbury, 1998), The Life of Mammals (Salisbury, 2002), or Planet Earth (Fothergill, 2006). Between 23:00 h on day 1 and 7:00 h on day 3 of the experiment, all participants wore an actigraph (ActiLife v4.4.0, ActiGraph, Pensacola, FL, United States). Participants staying in the laboratory were offered two snacks throughout the night and left the laboratory at 7:00 h. These snacks consisted of a piece of fruit, a cereal bar, and two slices of raisin bread. All participants were instructed to refrain from napping during the day following the first experimental day. After the wake group had a recovery night, the second day of testing started at 8:00 h (delayed recall; around 33 h after the end of the first experimental day). All participants declared compliance to the sleep-wake schedule of the experiment, which was ratified using actimetry data. At the end of the second experiment day, participants had to complete a word generation task (for details, see below).

Memory Task
The word-pair task was implemented using Presentation ® (version 16.3, Neurobehavioral Systems, Berkeley, CA, United States) on computers running on Windows 7 and adapted from Feld et al. (2016). Prior to each learning or recall block of 40 slightly related word-pairs, participants were instructed how to perform the task followed by two mock trials of the task procedure. Each phase consisted of 16 blocks which amounts to 640 word-pairs per phase. After eight blocks, there was a longer, 30-min break. During the A B FIGURE 1 | (A) Timeline of the experimental procedure. Learning started at 12:00 h and was followed by an immediate recall at 17:30 h as well as a delayed recall at 8:00 h 2 days later. The learning and recall phases each took 5 h and consisted of two parts each roughly 2 h 15 min long, separated by a 30-min break. During each part, participants either learned or retrieved 320 word-pairs for a total of 640 word-pairs. Each part was further divided into eight blocks of 40 word-pairs. Each block took exactly 20 min. During the learning phase participants spent 3 min and 20 s per block learning word-pairs one at a time and then listened to 16 min and 40 s of relaxing audio files. During the recall phase participants had up to 20 s (a maximum of 13 min and 20 s per block) to respond to each of the sequentially shown cue words by typing the correct target word. Participants spent the remaining time listening to a relaxing audio file. Note that both recall phases (immediate and delayed recall) followed exactly the same procedure. (B) Actimetry data. Each participant was given an actigraph at the end of immediate recall to verify compliance. The y-axis shows the activity of each participant during each of the four 8-h periods in arbitrary units. Each raincloud plot consists of the estimated distribution, a box-plot (indicating the median and the 2, 25, and 98% quantiles, the black outlined circles depict the mean) and the activity estimations for each subject as an individual point. Data from participants in the wake group are shown in green, whereas data from the sleep group are shown in red. Note the different scale used here. See Allen et al. (2018) for the code used in this visualization.
learning phase, each word-pair was presented once for 4 s (1 s inter-stimulus interval) on a horizontal axis divided by a hyphen. The left word was always the cue word, whereas the right word was always the target word. The order in which each word-pair occurred within a 40 word-pair block was randomized, and the order of the word-pair blocks was balanced between participants. The block order for each participant was the same for the learning phase and both recall phases. Blocks lasted for 20 min. In the learning phase, this meant participants were actively encoding word-pairs for 200 s (40 pairs × 5 s), while the remaining 1,000 s were spent listening to relaxing audio files (we used a different audio file for each block). During recall, participants were presented the cue word and had 20 s to type in their response. If participants wanted to move on or could not remember the target word, they were able to skip to the next word-pair by pressing the return key. Participants were instructed to answer even if they were not certain of the answer, but to avoid guesses. This instruction was given to prevent participants randomly entering words on every trial even when they were sure not to know the answer. The keyboard input was immediately displayed which made it possible to correct for mistakes. Similar to the learning phase, recall blocks were interweaved by periods during which participants listened to relaxing audios (the same audio files were used for learning and retrieval). Again, blocks would start 20 min apart which resulted in a minimum of 400 s of relaxing audio (if the participant used 20 s to answer each cue word). The recall procedure was identical for the immediate recall after learning and the delayed recall two mornings later (Figure 1). Setting the duration of each block to 20 min provided two benefits: it reduced interference between the blocks and ensured an equal amount of time between each block during the learning phase and its corresponding block during the retrieval phase. Word-pairs were scored manually. If the answer contained spelling mistakes, used the wrong gender, or number, the answer was still checked as correct. In alignment with Feld et al. (2016), we used absolute retention performance (the number of correctly recalled words at immediate recall subtracted from the number of correctly recalled words at delayed recall) as the dependent variable.

Working Memory (OSPAN)
The automated operation span (OSPAN; Unsworth et al., 2005) is a computer-based test to assess working memory capacity. Participants are shown simple mathematical equations in alternation with letters. Their task is to decide if the equations are correct while remembering the letters in the order they were presented. After three to six trials, an array of 12 letters is shown and participants are instructed to click on the previously shown letters. For our analysis, we used the absolute score and a "partial load". The absolute score refers to the sum of all correctly recalled letters, whereas the partial load is calculated by summing up only the letters of the trials that were recalled in the correct order. One dataset was lost due to hardware malfunction. This resulted in 38 datasets in the sleep condition and 39 datasets in the wake condition.
Control Measures (RWT, PVT, SSS, and Actimetry) In order to control for possible confounding variables, several measurements were taken. One of the detrimental effects of sleep deprivation is a diminished verbal fluency due to impaired retrieval processes (Harrison and Horne, 1997). To investigate, if, even after a recovery night, the wake condition was still suffering the negative consequences of sleep deprivation, participants completed a word generation task (RWT: Regensburger Wortflüssigkeits-Test; Aschenbrenner et al., 2000). The assignment was to generate as many words as possible of a given category (hobbies), or to generate words that start with a specific letter (in this case "m") during a 2-min period. For our analysis, we added up both results for a combined score. The PVT measures the participant's mean reaction speed and is an indicator of vigilance. The 5-min test requires pressing the space bar as soon as a bright red millisecond timer appears on the screen of a computer and starts counting up from 0000 in milliseconds immediately. The subject's reaction time is displayed as soon as the space bar is pressed. For our analysis, we calculated the mean reaction speed (1/reaction time) for each participant and the number of lapses, defined as reaction speed above 500 ms (Dinges and Kribbs, 1991). Due to hard drive malfunction, we lost six datasets (three in each condition) for the PVT. The SSS consists of a single item measuring the subjective sleepiness on a scale of 1 ("Feeling active, vital, alert, or wide awake") to 8 ("Asleep"; Hoddes et al., 1973). Five actimetry datasets (one from the sleep condition) could not be recovered due to hardware malfunction.

Statistical Analysis
All statistical analysis was conducted using SPSS Version 22.0.0 on a computer running on Windows 7 and Jamovi Version 1.0.4.0. For descriptive statistics, M stands for the mean and SEM for the standard error of the mean. Unless stated otherwise, we relied on a univariate ANOVA with experimental condition and sex as independent variables. The SSS and the PVT were analyzed using repeated measures ANOVA with each of the five data points as a withinsubjects factor and condition and sex as between-subject factors. The significance threshold for all statistical tests was set at 0.05.

Memory Performance
Recall in the wake group decreased from M Wake = 117.85 (SEM Wake = 7.16) at immediate recall to M Wake = 106.65 (SEM Wake = 6.78) at delayed recall. In the sleep group, the number of remembered word-pairs decreased from M Sleep = 112.08 (SEM Sleep = 9.93) to M Sleep = 103.61 (SEM Sleep = 9.33) at delayed recall. In a confirmatory analysis, the sleep and the wake group did not differ in retention performance (the absolute amount of word-pairs that were forgotten from the immediate recall phase to the delayed recall) on the word-pair task (M Wake = −11.2 SEM Wake = 1.73, M Sleep = −8.5 SEM Sleep = 1.70; sleep/wake: F 1,74 = 1.25, p = 0.27; Figure 2). There was also no effect of sex on retention performance (sex: F 1,74 = 0.012, p = 0.91; sleep/wake × sex F 1,74 = 23.3, p = 0.66). To provide an estimate of the evidence for the null effect, we performed an equivalence test using the two one-sided test procedure (Lakens et al., 2018) with upper and lower effect size bound set at d = ±0.2. Statistically this means, we tested whether the hypotheses that the positive effect of sleep on memory retention is larger than d = 0.2 and that the negative effect of sleep on memory retention is larger than d = 0.2 can be rejected with a one-sided t-test each. Here, there was evidence that a detrimental effect of sleep on retention larger than d = −0.20 can be ruled out (t 76 = 2.01, p = 0.02), whereas, there was no evidence against a positive effect of sleep larger than d = 0.20 (t 76 = 0.24, p = 0.60). This means that for sleep induced forgetting, there is evidence against medium and large effects, whereas small effects (d ≤ 0.2) could exist in our paradigm. For sleepdependent consolidation, even effects larger than d = 0.2 cannot be ruled out. In addition to this analysis, following the null-hypothesis-testing (NHST) tradition, we also used Bayesian statistics to determine evidence for null effects. This analysis provides the likelihood of the model given the data, rather than the probability of the data given the model as in NHST. Similar to the equivalence test, calculating the one-sided Bayes factor provided evidence against a detrimental effect of sleep on memory, it came up with BF 01 = 8.26 in favor of the null, whereas, evidence against a positive effect of sleep vs. the null was undecided BF 01 = 1.45. This means that the null hypothesis of no detrimental effect of sleep on memory is 8.26 times more likely than the alternative hypothesis of a detrimental effect of sleep on memory in comparison with our wake condition.
Next, we investigated whether these results were due to either group having learned more words during the initial learning phase. Although the wake group descriptively learned slightly more word-pairs, this difference did not reach statistical significance (sleep/wake: F 1,74 = 0.17, p = 0.68).

Memory Performance Blockwise
To verify this null-result, we next explored the performance per block for each condition (the experiment consisted of 16 blocks of 40 word-pairs each). As before, the absolute difference was used as a dependent variable in a repeated measures ANOVA with the block (1-16) as the repeated measurement and sleep/wake condition and sex as between subject variables (see Figure 3). However, no significant result emerged from this analysis (sleep/wake:

Gains and Losses
Gains refer to words that were correctly recalled during delayed recall, but not during immediate recall. Losses accordingly refer to words that were correctly remembered during immediate recall, but not during delayed recall. Subjects in the sleep and wake condition both gained on average 12.3 words (SEM Wake = 0.90; SEM Sleep = 0.87). Participants in the wake group lost on average 23.5 words (SEM Wake = 1.49), while participants in the sleep group lost on average 20.79 words (SEM Sleep = 1.62). We found no statistically significant difference between groups for neither gains (sleep/wake: F 1,74 = 0.001, p = 0.97), nor losses (sleep/wake: F 1,74 = 1.49, p = 0.23; Table 1).

A B C
FIGURE 2 | Raincloud plots (curves depict the estimated distribution, box-plots provide the median and the 2, 25, 75, and 98% quantiles, the black outlined circles depict the mean, the dots show individual data points) of the remembered word-pairs for the wake group (green) and the sleep group (red). (A) The absolute amount recalled during immediate recall that was part of the learning phase. (B) The absolute amount recalled during delayed recall that was part of the retrieval phase.
(C) The difference in word-pairs remembered between retrieval and learning. Note the different scale used here. See Allen et al. (2018) for the code used in this visualization.

Wrong Answers
Next, we analyzed the incorrect answers in detail. We investigated if there were group differences between the number of repeated wrong answers (same wrong answer during encoding and retrieval Word-pool errors refer to wrong answers containing a word that was learned at another point in the experiment (either as a cue word or as a target word). Using the relative occurrence within all wrong answers, we calculated the difference between word-pool errors during delayed recall and during immediate recall. We found no statistically significant difference between groups [M Sleep = 0.012 (SEM Sleep = 0.009), M Wake = −0.006 (SEM Wake = 0.009), sleep/wake: F 1,74 = 1.83, p = 0.18; Table 1].

Working Memory Capacity Task
We used a MANOVA because the two dependent variables correlated highly (r 75 = 0.908, p < 0.01). The MANOVA revealed no difference in the OSPAN Absolute score and the OSPAN Partial score between sleep and wake condition (M Absolute, Sleep = 43.18, SEM Absolute, Sleep = 2.73; M Absolute, Wake = 46.64, SEM Absolut, Wake = 2.88; M Partial, Sleep = 60.39, SEM Partial, Sleep = 1.57; M Partial, Wake = 61.26, SEM Partial, Wake = 1.99; sleep/wake: OSPAN Absolute : F 1,73 = 0.84, p = 0.36; OSPAN Partial : F 1,73 = 0.12, p = 0.73; Table 2). To investigate the relation between working memory capacity and memory performance on the word-lists (using the difference score), we calculated a Pearson product-moment correlation between memory retention performance and working memory capacity separately for the sleep and wake conditions, and for the OSPAN absolute and partial score. There was a trend toward significance when looking at the wake condition and the absolute OSPAN score (r 39 = −0.31, p = 0.059), while all other correlations were insignificant (p > 0.11). However, when considering all participants in both groups, there was a statistically significant negative relation between retention performance and the absolute OSPAN score (r 75 = −0.27, p = 0.02).
FIGURE 3 | The difference in remembered word-pairs between the retrieval and learning phase for each of the individual 16 blocks for the sleep (red) and wake groups (green). The horizontal line of each box-plot indicates the median, the black outlined circles depict the mean, the border of the box indicates the 25 and 75% quartiles, and the whiskers the 2 and 98% quantiles, respectively.

Actimetry Data
To analyze actimetry data, we used repeated measures ANOVA to compare both groups' activity levels in four 8 h windows. We found a statistically significant sleep/wake × time interaction (F 1.92, 136,55 = 35.1, p < 0.001, η 2 = 0.33; Greenhouse-Geisser corrected). Subsequent two-tailed t-tests revealed, that the difference was mainly driven by a difference in the first 8 h window, i.e., during sleep deprivation in the wake group (t 48.0 = −21.2, p < 0.001; Figure 1B). Actimetry data in all other time windows showed no significant difference between the wake and sleep group (all p > 0.09).

Subjective Sleepiness
The SSS score at some time points was affected by the sleep/ wake condition (sleep/wake × time: F 4,304 = 2.46, p = 0.046, η 2 = 0.031). Individual two-tailed t-tests revealed that this difference was mainly driven by higher subjective sleepiness in the wake group during the fourth (t 76 = −1.92, p = 0.058) and fifth (t 76 = −1.97, p = 0.053) assessment points, which both occurred on the second day of the experiment ( Table 2).

Vigilance (PVT)
There was no significant difference between groups' average reaction speed (all p > 0.24). Likewise, there was no significant main effect or interaction of condition regarding the number of lapses (all p > 0.44; Table 2).

Animal-Related Words
Animal-related words might have been reactivated while participants watched animal documentaries during sleep deprivation. To test the extent of this potential confound, we excluded all animal-related words in an exploratory analysis. For this, we first vectorized all words using the MATLAB function word2vec (Mikolov et al., 2013) and a German pre-trained word embedding (Grave et al., 2018) that was trained on over 6.7 billion tokens to compute the cosine similarity between the vector representation of the German word for animal and all words used in the experiment. We used the animal with the lowest cosine similarity to the reference vector as a cut-off point. All trials that contained words with a higher cosine similarity were excluded leaving 473 word-pairs for the analysis. One word (San Francisco) was not represented in the word embedding and was manually identified as not animal-related. A univariate ANOVA on the pruned dataset with the difference score as the dependent variable and condition and sex as the independent variables revealed no significant results (all main effects and interactions p > 0.27).

DISCUSSION
Previous work suggests that sleep-dependent memory consolidation is a process limited in capacity and that learning large amounts of information overloads active systems consolidation and abolishes the positive effect of sleep on memory retention (Feld et al., 2016). Extrapolating from this data, we hypothesized that at even higher loads of information during encoding sleep may favor forgetting over consolidation (Feld and Born, 2017). Here, we directly tested this hypothesis by asking participants to encode a very large amount of information (640 word-pairs, twice the amount used before in the long list condition of Feld et al., 2016) before either a night of sleep or total sleep deprivation. Contrary to our predictions, we found word-pair retention to be comparable between the sleep and wake groups. While this null-effect can be reconciled with the view that capacities for consolidating memory during sleep are limited, it contradicts our hypothesis that sleep causes increased forgetting of declarative memory when compared to wakefulness. It is important to note that sleep might still induce forgetting under conditions of massed learning, but that this effect might be masked by a direct comparison with a wake condition which itself induces forgetting (as discussed in detail below). A thorough post-hoc analysis revealed no group differences regarding a multitude of response patterns (such as gain and loss, word-pool errors, wrong answers, and exclusion of animal-related words). In light of these results, we propose that similar amounts of forgetting of memory traces could be achieved by different processes for the wake and sleep group, respectively. It has long been known that retroactive interference during wakefulness affects retention, which is absent during sleep. Therefore, in the wake group, memory traces are more prone to interference, whereas during sleep memories are protected Frontiers in Psychology | www.frontiersin.org from interference (Wixted, 2005;Ellenbogen et al., 2006a). However, in the sleep group, memories might be forgotten due to global synaptic downscaling (Tononi and Cirelli, 2006). Hence, our failure to find sleep enhanced forgetting can be explained by wake forgetting accelerating at a similar pace with increasing length of the word-lists making it very difficult to dissociate these processes. Importantly, the absence of a specific association between working memory capacity and retention performance is not at odds with prior research (e.g., Kane and Engle, 2000), as interference during learning was kept constant between the sleep and wake group and working memory was not specifically loaded during wake retention. Accordingly, if forgetting in the wake condition is primarily driven by interference within the task, then increasing or decreasing this interference (e.g., by making the stimulus material more or less semantically related) will lead to more forgetting in participants in the wake condition, but not in the sleep condition (Drosopoulos et al., 2007;Yonelinas et al., 2019). To reduce within task interference, it may also be interesting to use free recall of word lists instead of cuedrecall of word-pairs. However, our analysis of word pool errors did not indicate that there was increased within task interference in the wake group. Alternatively, task-unrelated interference could be manipulated by asking participants in the wake condition to learn an unrelated verbal memory task during sleep deprivation. Contrasting with the wake state, we assume that global synaptic downscaling causes forgetting in the sleep group, which could equally be manipulated in this paradigm. Since global synaptic downscaling is assumed to occur primarily during slow waves (Vyazovskiy et al., 2008;Kim et al., 2019), closed loop auditory stimulation could be employed to increase slow waves, causing more forgetting in participants that have previously encoded a high amount of information (Ngo et al., 2013). Conversely, preventing participants from reaching deep sleep should lead to less forgetting. This would prevent forgetting in two ways, first by preventing interference through new encoding, as long as sleep itself is maintained, and second by preventing synaptic downscaling during deep sleep. An alternative account of the absence of an enhanced forgetting during sleep, induced by massed learning, can be derived by considering recent findings of synaptic downscaling mechanisms during sleep. Although it has been suggested that active systems consolidation is specific and selective, whereas synaptic downscaling is global and indiscriminate (e.g., Feld and Born, 2017), there also exists the opposite suggestion of selective downscaling during sleep (Tononi and Cirelli, 2014). This latter account is largely based on findings of selective weakening of synapses during sleep, where only weaker, more plastic, synapses are erased, while stronger synapses remain stable (de Vivo et al., 2017). In addition, it has been found that sharp-wave/ripples (a correlate of memory reactivation during sleep) are involved in the depotentiation of synapses within the hippocampus (Norimoto et al., 2018), whereas they appear to be involved in the potentiation of synapses in the cortex during sleep spindles (Khodagholy et al., 2017). This offers the intriguing possibility that active systems consolidation and selective synaptic downscaling during sleep occur in a highly coordinated fashion, i.e., during the same reactivation events but in different brain areas. According to this framework the successful integration of memories into the knowledge stores of the cortex via active systems consolidation would signal the deletion of redundant memory traces from the hippocampus through synaptic downscaling. In conclusion, similar to active systems consolidation, selective synaptic downscaling during sleep might be limited by the number of available reactivations during sleep. This offers an explanation for the lack of sleepinduced net forgetting.
Turning to our findings on working memory, prior work by Fenn and Hambrick (2012) reported a positive relation between working memory and sleep related memory performance. In contrast to that, Feld et al. (2016) measured working memory performance before any sleep/wake intervention took place (which rules out any biases due to the circadian rhythms or sleep manipulation) and found no significant correlation between the two. Using the same methodology, in the present study, we found a negative correlation, although most did not reach significance. Given these contradicting findings, we suggest that sleep-dependent memory performance is likely independent of working memory functioning.
There are several limitations that were impossible to eliminate in this study and therefore possibly contribute to our results. (1) Variability in memory performance between subjects increases with list size and although the sample size was large in comparison to other studies (Quigley et al., 2000;Ellenbogen et al., 2006a;Marshall et al., 2006;Ngo et al., 2013;Igloi et al., 2015;Studte et al., 2017), this probably decreased statistical power. To ameliorate this noise issue, criterion learning could be used in future studies, where learning is repeated until a certain percentage of word-pairs can be recalled correctly. Importantly, a study comparing different criterions found that a 60% criterion is well-suited to tap into the sleep effect (Drosopoulos et al., 2007). We did not use this method, as it would have consumed significantly more time for learning, which would have made it impossible to space out learning und thus strongly increase interference effects. (2) It is possible, that not enough time had passed for sleep effects to emerge. Already Graves (1936) using nonsense-syllables found a sleep benefit only after 72 h and not at shorter intervals. Similarly, Richardson and Gough (1963) did not find a sleep effect after 24/48 h, but after 144 h. Especially, for large amounts of information, it is conceivable that consolidation as well as forgetting is carried over to subsequent nights. (3) We tested declarative memories that were intentionally encoded. It might be that the underlying processes such as an enhanced activation of prefrontal-hippocampal circuitry, preclude such information from sleep-dependent forgetting (Himmer et al., 2017), which stimulates the idea to compare, in future studies, sleep effects on high loads of intentionally and incidentally encoded memory. (4) Doubling the list length did not double the amount of word-pairs remembered immediately after learning.  640 = 117.85 (7.16) in the current 640 word-pair version. This may indicate that other encoding strategies were used that render the word-pairs less susceptible to effects of reprocessing during sleep. (5) Using actimetry data, we are limited in identifying short periods of sleep during the day. Future studies could implement electroencephalogram (EEG) recordings to aid in that regard. Electrophysiological recordings could also rule out if memory consolidation in the wake group was merely postponed to the recovery night. (6) Although the animal documentaries that the wake group watched during sleep deprivation have been selected to be emotionally neutral, it is virtually impossible to rule out that some events reactivated associations with the previously learned word-pairs. We found that even after excluding animal-related words memory performance between participants in the wake and sleep group did not significantly differ. This speaks against the idea that watching animal documentaries in the wake group influenced memory consolidation to a meaningful extent.
To conclude, in the current experiment we did not find evidence that a high information load leads to more forgetting during sleep when compared to wakefulness. These findings can be explained by different mechanisms leading to forgetting in both brain states: interference-induced forgetting in the wake group and forgetting due to global synaptic downscaling in the sleep group. We propose several approaches how future studies can test this new hypothesis.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the ethics committee of the medicine department of the Eberhard Karls University and university clinic Tübingen. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
GF and JB designed the experiment. LK carried out the experiment, performed the analyses, designed the figures, and wrote the initial manuscript. GF performed the equivalence test and the bayesian statistics. All authors contributed to the article and approved the submitted version.

FUNDING
This research was supported by grants from the Deutsche Forschungsgemeinschaft (DFG) SFB 654 "Plasticity and Sleep" and the European Research Council (grant No. 883098 SleepBalance) to JB and a DFG Emmy Noether Grant (FE 1617/2-1) to GF. The authors declare no competing financial interests.