Increases in sensory noise predict attentional disruptions to audiovisual speech perception

We receive information about the world around us from multiple senses which combine in a process known as multisensory integration. Multisensory integration has been shown to be dependent on attention; however, the neural mechanisms underlying this effect are poorly understood. The current study investigates whether changes in sensory noise explain the effect of attention on multisensory integration and whether attentional modulations to multisensory integration occur via modality-specific mechanisms. A task based on the McGurk Illusion was used to measure multisensory integration while attention was manipulated via a concurrent auditory or visual task. Sensory noise was measured within modality based on variability in unisensory performance and was used to predict attentional changes to McGurk perception. Consistent with previous studies, reports of the McGurk illusion decreased when accompanied with a secondary task; however, this effect was stronger for the secondary visual (as opposed to auditory) task. While auditory noise was not influenced by either secondary task, visual noise increased with the addition of the secondary visual task specifically. Interestingly, visual noise accounted for significant variability in attentional disruptions to the McGurk illusion. Overall, these results strongly suggest that sensory noise may underlie attentional alterations to multisensory integration in a modality-specific manner. Future studies are needed to determine whether this finding generalizes to other types of multisensory integration and attentional manipulations. This line of research may inform future studies of attentional alterations to sensory processing in neurological disorders, such as Schizophrenia, Autism, and ADHD.

We receive information about the world around us from multiple senses which combine in a process known as multisensory integration. Multisensory integration has been shown to be dependent on attention; however, the neural mechanisms underlying this effect are poorly understood. The current study investigates whether changes in sensory noise explain the effect of attention on multisensory integration and whether attentional modulations to multisensory integration occur via modality-specific mechanisms. A task based on the McGurk Illusion was used to measure multisensory integration while attention was manipulated via a concurrent auditory or visual task. Sensory noise was measured within modality based on variability in unisensory performance and was used to predict attentional changes to McGurk perception. Consistent with previous studies, reports of the McGurk illusion decreased when accompanied with a secondary task; however, this effect was stronger for the secondary visual (as opposed to auditory) task. While auditory noise was not influenced by either secondary task, visual noise increased with the addition of the secondary visual task specifically. Interestingly, visual noise accounted for significant variability in attentional disruptions to the McGurk illusion. Overall, these results strongly suggest that sensory noise may underlie attentional alterations to multisensory integration in a modality-specific manner. Future studies are needed to determine whether this finding generalizes to other types of multisensory integration and attentional manipulations. This line of research may inform future studies of attentional alterations to sensory processing in neurological disorders, such as Schizophrenia, Autism, and ADHD. KEYWORDS multisensory integration (MSI), attention, dual task, McGurk effect, perceptual load, audiovisual speech, sensory noise, neural mechanisms

Introduction
The interactions between top-down cognitive processes and multisensory integration have been heavily investigated and shown to be intricate and multidirectional (Talsma et al., 2010;Cascio et al., 2016;Stevenson et al., 2017). Previous research using different methods to manipulate attention and measure multisensory integration has demonstrated that multisensory integration is lessened under high attentional demand and relies on the distribution of attention to all stimuli being integrated (Alsius et al., 2005(Alsius et al., , 2007Talsma et al., 2007;Mozolic et al., 2008;Koelewijn et al., 2010;Tang et al., 2016;Gibney et al., 2017). Studies investigating the time point(s) during which attentional alterations influence multisensory processing have identified both early and late attentional effects Talsma et al., 2007;Mishra et al., 2010). Additionally, multiple areas such as the Superior Temporal Sulcus (STS), Superior Temporal Gyrus (STG), and extrastriate cortex have been identified as cortical loci of attentional changes to multisensory processing (Mishra and Gazzaley, 2012;Morís Fernández et al., 2015). Collectively, these studies suggest that attention alters multisensory processing at multiple time points and cortical sites throughout the sensory processing hierarchy.
The precise mechanisms by which attention alters multisensory integration remain unknown. Multisensory percepts are built through hierarchical processing within sensory systems, coherent activity across multiple cortical sites, and convergence onto heteromodal areas (for an extensive review see Engel et al., 2012). Alterations in attention may primarily disrupt multisensory integration by interfering with integrative processes such as synchronous oscillatory activity across cortical areas or processing of multisensory information within heteromodal areas (Senkowski et al., 2005;Schroeder et al., 2008;Koelewijn et al., 2010;Al-Aidroos et al., 2012;Friese et al., 2016). Attention and oscillatory synchrony have been shown to interact in a number of studies Gomez-Ramirez et al., 2011;Keil et al., 2016), thus strengthening the possibility of this potential mechanism. Although there is convincing evidence for attentional changes to integrative processes, there is a strong likelihood that disruptions in unisensory processing may explain, in part, attentional alterations in multisensory integration. An extensive research literature clearly demonstrates that attention influences unisensory processing within each sensory modality (Woldorff et al., 1993;Mangun, 1995;Driver, 2001;Pessoa et al., 2003;Mitchell et al., 2007;Okamoto et al., 2007;Ling et al., 2009). Additionally, attention has been shown to improve the neural encoding of auditory speech in lower-order areas and to selectively encode attended speech in higher-order areas Zion Golumbic E. M. et al., 2013). Alterations in the reliability of unisensory components of multisensory stimuli have been clearly demonstrated to alter patterns of multisensory integration such that the brain more heavily weighs input from the modality providing the clearest information (Deneve and Pouget, 2004;Bobrowski et al., 2009;Burns and Blohm, 2010;Magnotti et al., 2013Magnotti et al., , 2020Beauchamp, 2015, 2017;Noel et al., 2018a). Thus, disruptions in attention may result in increased neural variability during stimulus encoding (sensory noise) causing degraded unisensory representations to be integrated into altered multisensory perceptions. Few studies have directly assessed the impact of attention on sensory noise and multisensory integration (Schwartz et al., 2010;Odegaard et al., 2016); thus, more exploration is needed to determine whether attentional influences on multisensory integration may be explained by increases in sensory noise.
Psychophysical tasks utilizing multisensory illusions may be able to determine whether attentional alterations in multisensory integration are mediated by disruptions in modality-specific processing. Multisensory illusions which result from discrepancies in information across modalities are ideally suited for this type of experimental design because the strength of the illusion can be altered by changing the reliability of the component unisensory stimuli and these effects can be modeled by measuring the ratio of visual and auditory sensory noise (Körding et al., 2007;Magnotti and Beauchamp, 2017). The McGurk effect is a well-known illusion that has been used to study multisensory speech perception (Mcgurk and Macdonald, 1976) and the effects of attention on audiovisual speech integration. The strength of the McGurk effect has consistently been shown to decrease with increasing perceptual load in dual-task studies (Paré et al., 2003;Alsius et al., 2005Alsius et al., , 2007Alsius et al., , 2014Soto-Faraco and Alsius, 2009;Gibney et al., 2017). Because audiovisual speech can be understood through its unisensory components and requires extensive processing of the speech signal prior to integration Zion Golumbic E. M. et al., 2013), there is a strong likelihood that attentional alterations in audiovisual speech integration may be explained by disruptions to the unisensory processing of speech information. Specifically, disruptions in the encoding of visual speech components would be expected to weaken the McGurk Effect while disruptions in the encoding of auditory speech components would strengthen the McGurk Effect.
In this study, we investigate attentional influences on early auditory and visual processing by examining modalityspecific attentional changes to sensory noise. In two separate experiments, participants completed a McGurk task that included unisensory and congruent multisensory trials while concurrently completing a secondary auditory or visual task. Sensory noise was calculated from the variability in participants' unisensory responses separately for the auditory and visual modalities. Multiple regression analysis (MRA) was then used to determine the impact of visual noise, auditory noise, and distractor modality on McGurk reports at baseline and changes in McGurk reports with increasing perceptual load. We predicted that increases in perceptual load would lead to decreases in the McGurk effect and increases in sensory noise within the same modality as the distractor. Additionally, we predicted that changes in McGurk reports with increasing load would be best predicted by changes in visual noise (as compared to changes in auditory noise).  Gibney et al. (2017). Twentythree (23) participants completed both experiments in separate sessions. Participants were excluded from final analysis if they did not complete at least four repetitions of every trial type (45) or did not have a total accuracy of at least 60% on the distractor task for the high load condition (12). Thus, 115 participants were included in the final analysis. Participants reported normal or corrected-to-normal hearing and vision and no prior history of seizures. Participants gave written informed consent and were compensated for their time. Study procedures were conducted under the guidelines of Helsinki and approved by the Oberlin College Institutional Review Board.

Experimental design overview
We employed a dual-task design to determine the effects of attention within a specific sensory modality on McGurk perceptions and on sensory noise within each modality. Similar dual task designs have been shown to reduce attentional capacity (Lavie et al., 2003;Stolte et al., 2014;Bonato et al., 2015). Participants completed a primary McGurk task concurrently with a secondary visual or auditory distractor task for which the level of visual or auditory perceptual load was modulated. Full methodology for both the primary McGurk task as well as the secondary distractor tasks has been previously published in Dean et al. (2017) and Gibney et al. (2017); however, we provide a brief overview of all tasks here. All study procedures were completed in a dimly lit, sound-attenuated room. Participants were monitored via closed-circuit cameras for safety and to ensure on-task behavior. All visual stimuli were presented on a 24" Asus VG 248 LCD monitor at a screen resolution of 1,920 • × • 1,080 with a refresh rate of 144 Hz at a viewing distance of 50 cm from the participant. All auditory stimuli were presented from Dual LU43PB speakers which were powered by a Lepas LP-2020AC 2-Ch digital amplifier and were located to the right and left of the participant. SuperLab 4.5 software was used for stimulus presentation and participant response collection. Participants indicated their responses on a Cedrus RB-834 response box, and responses were saved to a txt file.

McGurk task
Participants were presented with videos of a woman speaking one of four syllables "ba" (/ba/), "ga" (/ga/), "da" (/da/), or "tha" (/tha/, voiceless) ( Figure 1A). Trials were either unisensory (visual-only; auditory-only) or multisensory (congruent; incongruent illusory; incongruent non-illusory). In unisensory trials, participants were presented with either the visual (visual-only) or auditory (auditory-only) components of the video for each syllable. Multisensory videos had both an auditory and a visual component and were either congruent (e.g., visual "ba" auditory "ba"), incongruent non-illusory (visual "ba" auditory "ga"), or incongruent illusory (visual "ga" auditory "ba"). Participants responded to the prompt, "What did she say?" by pushing one of four buttons labeled "ba, " "ga, " "da, " or "tha." Although eye movements were not monitored, participants were explicitly instructed to maintain their gaze on the speaker's mouth throughout the duration of the study. Each unisensory syllable was repeated 8 times for a total of 32 visualonly and 32 auditory-only trials. Each congruent multisensory syllable was repeated 8 times for a total of 32 total congruent multisensory trials. Lastly, there were 16 illusory incongruent and 16 non-illusory incongruent trials.

Secondary visual distractor task
Rapid serial visual presentation (RSVP) stimuli of white letters, yellow letters, and white numbers presented continuously below the McGurk videos ( Figure 1A). Each letter and number in the RSVP stream was presented for 100 ms with 20 ms between letters and numbers. The visual distractor task included four condition types: distractor free (DF), no perceptual load (NL), low perceptual load (LL), and high perceptual load (HL). During distractor-free blocks, no visual or auditory distractors were presented; thus, participants completed the McGurk task in isolation. When the RSVP stream was presented concurrently with the McGurk task, participants were asked to either ignore it (NL), detect infrequent yellow letters (LL), or detect infrequent white numbers (HL). There was a 50% chance that the target would be present in each trial. After each presentation, participants were asked to respond first to the McGurk task then report whether they observed a target within the RSVP stream with a "yes" or "no" button press. Each load condition was completed in a separate block, and the order of blocks was randomized and counterbalanced across participants. Participants completed all perceptual load blocks in one session. Psychophysics tasks and sensory noise calculations. (A) Participants watched videos of a woman speaking one of four syllables, after which they reported if she said: "ba," "ga," "da," or "tha." Rapid serial visual presentation (RSVP) or rapid serial auditory presentation (RSAP) stimuli accompanied speech videos during no load (NL), low load (LL), and high load (HL) blocks. For the visual distractor task, participants detected a yellow letter (LL) or a white number (HL). For the auditory distractor task, participants detected a high-pitched tone (LL) or a long-duration tone (HL). Identifiable human image used with permission. (B) Mapping of possible responses in representative audio-visual space. Panel (C) shows sensory noise calculations for an example participant. Sensory noise was calculated for each participant using responses from visual (top) and auditory (bottom) only trials. Gaussian distributions of these responses were determined via bootstrapping (middle), and the standard deviation of this distribution was calculated for each syllable. The overall visual (top, last panel) and auditory (bottom, last panel) noise for each participant was calculated as the average standard deviation of all syllabi within each modality.

Secondary auditory distractor task
Stimuli consisted of rapid serial auditory presentation (RSAP) of musical notes at frequencies between 262 and 523 Hz. Each note was presented for 100 ms with 20 ms between notes ( Figure 1A). As in the visual distractor task, there were four auditory perceptual load conditions: no distractors presented alongside McGurk stimuli (DF); distractor stimuli were present but not attended (NL), participants were asked to detect a tone significantly higher pitch (1,046-2,093 Hz) than the standard tones (LL); participants were asked to detect notes that were twice the duration of the standard tones (HL). For LL and HL trials, there was a 50% probability that the target would be present in the RSAP stream. After each presentation, participants first responded to the McGurk task, then selected "Yes" or "No" to indicate if they observed the target. Participants completed all perceptual load blocks in one session.
2.6. Data analysis 2.6.1. Psychophysical analyses Responses for incongruent illusory trials on the McGurk task were divided into "visual" ("ga"), "auditory" ("ba"), and "fused" ("da" or "tha"). Percent fused reports were calculated for each participant for each perceptual load condition and distractor modality. We conducted a repeated-measures analysis of variance (RMANOVA) on percent fused reports with load (NL or HL) as a within-subject factor and distractor task modality (visual or auditory) as a between-subjects factor to determine whether increasing perceptual load affected the perception of the McGurk Illusion and whether this effect was modulated by distractor modality.

Sensory noise calculations
Previous models have been developed to determine sensory noise Beauchamp, 2015, 2017). However, these models do not account for visual and auditory noise independently. Including visual and auditory noise independently permits investigations into how distractors impact precision of information available when forming McGurk percepts, which may be important for understanding attentional influences on multisensory integration. We assessed sensory noise in both modalities using variability in responses to unisensory visual and auditory presentations. Previous studies determined that the encoding of auditory and visual cues follow separate Gaussian distributions and that the variance of that distribution reflects sensory noise (Ma et al., 2009;Magnotti and Beauchamp, 2017). Responses to visual and auditory-only trials were used to estimate sensory noise separately for each experimental condition: syllable presented ("ba, " "tha, " "da, " "ga"), distractor modality (auditory or visual), and perceptual load (DF, NL, or HL). Each response was assigned a value reflecting the reported syllable's relative location in audiovisual perceptual space (Figure 1B; Ma et al., 2009;Olasagasti et al., 2015;Magnotti and Beauchamp, 2017;Lalonde and Werner, 2019). In line with previous work, fused reports were placed in the middle of "ba" and "ga" (Magnotti and Beauchamp, 2017). However, our study design permitted two options "da" and "tha" for fused responses. To account for differences in between the two syllables we adapted a 10-point scale. This would permit us to separate "tha" and "da, " to accommodate previous findings that "tha" is more similar to "ba, " while "da" is more similar to "ga" (Lalonde and Werner, 2019). Further, Lalonde and Werner identified multiple consonant-groups separating each syllable, thus a 10-point scale would reflect distance in audiovisual space between each syllable.
We bootstrapped 10,000 samples for each participant's response to each syllable presented during auditory-and visual-only trials ( Figure 1C, Stein et al., 2009). We averaged each syllable's overall visual (σ Vis ) and auditory (σ Aud ) noise for each condition by taking the average sensory noise for all syllables presented during visual or auditory-only trials. Finally, we calculated combined sensory noise to account for both visual and auditory noise. We used the equation: , which is based on calculations from maximum likelihood estimate models (Ernst and Banks, 2002) and comparable to models using auditory/visual noise ratio (Magnotti and Beauchamp, 2017;Magnotti et al., 2018). This produces a distribution of combined sensory noise values between 1 and −1, with values >0 indicating that visual noise is greater.

Multiple regression modeling
We developed two multiple linear regression models to determine the effect of sensory noise on McGurk perceptions. We chose to use linear regression because to investigate the roles of attention and sensory noise on the likelihood of perceiving the McGurk effect. Additionally, relevant factors used in the analyses showed significant linear relationships with our dependent factors. The first model investigated factors contributing to McGurk responses at baseline, and the second investigated changes in McGurk responses with increasing perceptual load. All testing and model assessments were carried out in SPSS. First, preliminary model fitting was conducted on data from individuals excluded (n = 57) due to poor distractor task performance and lack of unisensory data to explore the relationship between baseline McGurk values and multiple possible predictor variables. These variables included visual noise, auditory noise, distractor modality, accuracy on auditory and visual distractor tasks, and interaction terms. Preliminary results suggested that visual noise, auditory noise, and the combination of the two could be predictive of McGurk responses. After determining potential predictors from excluded data, we then determined whether McGurk responses at baseline (distractor-free condition) correlated with each sensory noise measure (visual, auditory, and combined) to construct the final multiple regression model. Importantly, this baseline regression model allowed us to better contextualize our results and our novel method of estimating sensory noise within modality in the context of previous studies which also relate sensory noise to measures of multisensory integration.
Our second multiple regression analysis modeled the change in McGurk perception from NL to HL ( McGurk = HL McGurk reports − NL McGurk reports). To determine which predictive variables to include, we performed an RMANOVA with visual noise, auditory noise, and combined noise as dependent variables with load (NL and HL) as a withinsubjects factor and distractor modality as a between-subjects variable. The variables that were significantly predicted by load were included in a single-step multiple regression model of McGurk: distractor modality, change in visual noise, and baseline McGurk values. Notably, changes in auditory noise and combined noise were excluded because neither these variables nor their interaction with distractor modality were significantly predicted by load nor did they correlate with changes in McGurk reports across load.

Results
Participants completed a McGurk detection task to assess their integration of speech stimuli. This task was completed alone (DF) or in addition to a secondary distractor task at various perceptual loads (NL and HL). Participants were separated by which distractor modality (auditory or visual) was presented during the dual-task conditions.

Attentional alterations to McGurk perception
To assess baseline levels of multisensory integration, percent fused responses ("da" or "tha") were calculated for illusory trials (auditory "ba" and visual "ga") during the distractor-free block (Figure 2). Independent t-tests revealed significant differences in mean baseline illusory percepts between the auditory distractor group (percent fused = 41.05) and visual distractor group (percent fused = 68.11; t 105 = 4.54, p = 1.50 × 10 −5 , Cohen's d = 0.724). These differences were confirmed with bootstrapped (95% CI: 4.45-32.04, p = 0.015), non-parametric (U N,AudDist:134; N,VisDist:58 = 2,191, z = −4.85, p = 1.26 × 10 −6 ) and Bayesian (t 190 = 4.81, p = 7.4 × 10 −6 , BF = 0.00) sample comparisons. Because the distractor-free block was identical for the visual and auditory distractor studies and was most often completed after a NL, LL, or HL block, these results may indicate McGurk fused reports for distractor free blocks. The percent of fused reports ("da" or "tha") during distractor free blocks is shown for each participant for the visual distractor and auditory distractor groups. Horizontal black bars indicate group averages, and violin plots display the distribution of percent fused reports for each task. * * * Indicates p < 0.001.
that McGurk perception is affected by the modality of distractors within the context of the entire task.
To assess how McGurk perception changes with increasing perceptual load, we calculated fused responses during no-load and high load blocks (Figure 3) for both the auditory distractor group (NL %fused = 45.90, HL %fused = 37.80) and visual distractor group (NL %fused = 60.86, HL %fused = 33.68%). A two-way RMANOVA with fused responses as the dependent factor, perceptual load as a within-subjects factor, and distractor modality as a between-subjects factor revealed a main effect of perceptual load (F 1,133 = 48.36, p = 1.45 × 10 −10 , partial η 2 = 0.267) and an interaction between load and distractor modality (F 1,133 = 14.15, p = 2.52 × 10 −4 , partial These results indicate that increasing perceptual load leads to a decrease in integration; however, visual distractors led to a greater decrease in integration than auditory distractors. Supplementary material include figures and statistics for participant distractor task accuracy (Supplementary Figure 1), unisensory and multisensory congruent trial-type accuracy (Supplementary Figure 2), and changes in McGurk reports across NL, LL, and HL (Supplementary Figure 3) for both distractor modalities.

Change in sensory noise
Next, we investigated whether perceptual load increased sensory noise and whether this effect was dependent on distractor or noise modality (Figure 5). For the auditory distractor group, auditory noise (NL σ Aud 0.12, HL σ Aud 0.12) and visual noise (NL σ Vis 0.48, HL σ Vis 0.47) remained stable across load. For the visual distractor group, auditory noise remained stable (NL σ Aud 0.15, HL σ Aud 0.17); however, visual noise increased (NL σ Vis 0.52, HL σ Vis 0.67). An McGurk fused reports for no load (NL) and high load (HL) bocks. The percent of fused reports ("da" or "tha") during NL and HL blocks are shown for each participant for the auditory distractor (A) and visual distractor (B) tasks. Horizontal black bars indicate group averages. Colored lines connect individual percent fused reports across each block with a green line indicating an increase in fused reports from NL to HL and a red line indicating a decrease. The difference in percent fused reports across load for rapid serial visual presentation (RSVP) and rapid serial auditory presentation (RSAP) tasks is shown in panel (C). * * * Indicates p < 0.001 and * indicates p < 0.05.

FIGURE 4
Sensory noise for distractor free blocks. Auditory and visual sensory noise is shown separately for the auditory distractor (A) and visual distractor (B) groups. Horizontal black bars indicate group averages, and violin plots display the distribution of sensory noise in each modality for each task. Panel (C) shows auditory and visual sensory noise for all participants connected for each participant with straight lines. * * * Indicates p < 0.001.
Visual noise increased from no load to high load in visual modality only (t 86 = −4.78, p = 7.28 × 10 −6 , Cohen's d:  Changes in sensory noise across perceptual load. Auditory noise does not change with increasing auditory (A) or visual (B) perceptual load. Visual noise increases with increasing visual (E) but not auditory (D) noise. HL-NL differences in auditory noise (C) and visual noise (F) confirm that visual load selectively increases visual noise. Horizontal black bars indicate group averages, and violin plots display the distribution of HL-NL sensory noise differences for each distractor and noise modality. * * * Indicates p < 0.001.

Baseline McGurk reports
We constructed a multiple linear regression model to determine which sensory noise measures (auditory noise, visual noise, or a combination of both) best predicted baseline McGurk reports. Distractor Modality was included in the model because our RMANOVA analyses (described above) identified it as a significant factor. While neither visual noise (r 134 = 0.028, p = 0.701) nor auditory noise (r 134 = 0.118, p = 0.104) correlated with baseline McGurk reports, combined noise did significantly correlate with baseline McGurk reports (r 134 = −0.172, p = 0.017). Thus, we constructed a multiple regression model to predict baseline McGurk reports with distractor modality and combined noise as factors (Table 1). A significant relationship was found (F 2,189 = 13.24, p = 4.16 × 10 −6 ) with an R 2 of 0.123. Baseline McGurk reports were significantly predicted by distractor modality (β = −0.306, t = −4.49, p = 1.26 × 10 −5 ; bootstrap p = 0.0002) and combined noise (β = −0.150, t = −2.20, p = 0.029; bootstrap p = 0.049; Figure 6A). Neither auditory noise ( F 1,188 = 0.05, p = 0.817; R 2 = 0.0002) nor visual noise ( F 1,187 = 3.25, p = 0.073; R 2 = 0.015) significantly increased the predictability of this multiple regression model when added in stepwise fashion, confirming the relative importance of combined noise in predicting baseline McGurk perceptions. Significant predictors of McGurk fused reports. Our first model identified combined sensory noise and distractor modality as significant predictors of fused reports during baseline conditions (distractor free) (A). Changes in fused reports from NL to HL conditions were related to both baseline McGurk fused reports (B) and to the change in visual noise (C). Shaded regions reflect the 95% confidence interval for the regression.

Dual task McGurk reports
We constructed a multiple linear regression model to determine which factors contributed to changes in McGurk reports with increasing perceptual load. To determine which factors to include, we performed separate RMANOVAs with visual noise, auditory noise, or combined noise as dependent variables, perceptual load as a within-subjects factor, and distractor modality as a between-subjects factor. For visual noise, there was a significant main effect of load (F 1,133 = 8.51, p = 0.004, partial η 2 = 0.060) and distractor modality (F 1,133 = 11.079, p = 0.001, partial η 2 = 0.077) as well as a significant interaction between load and distractor modality (F 1,133 = 12.38, p = 0.001, partial η 2 = 0.085). There were no significant effects for auditory noise (load: F 1,133 = 0.164, p = 0.686, partial η 2 = 0.001; distractor modality: F 1,133 = 3.064, p = 0.082, partial η 2 = 0.023; interaction: F 1,133 = 0.173, p = 0.678, partial η 2 = 0.001) or combined noise (load: F 1,133 = 0.720, p = 0.398, partial η 2 = 0.005; distractor modality: F 1,133 = 0.421, p = 0.517, partial η 2 = 0.003; interaction: F 1,133 = 0.101, p = 0.751, partial η 2 = 0.001). Additionally, the change in McGurk reports from no load to high load significantly correlated with the change in Visual Noise from no load to high load (r 134 = −0.235, p = 0.006) and not change in Auditory (r 134 = −0.085, p = 0.330) or change in Combined Noise (r 134 = −0.044, p = 0.615). Collectively, these results suggest that changes in visual noise across load best explain changes in McGurk perception with increasing load as compared to other measures of sensory noise. Thus, we constructed a multiple linear regression model with change in McGurk reports from no load to high load as the dependent variable and the following potential explanatory variables: baseline McGurk reports, change in visual noise, and distractor modality ( Table 2)

Discussion
The present study investigated whether variations in sensory noise could explain the impact of attention on multisensory integration of speech stimuli and to what extent this mechanism operates in a modality-specific manner. To examine within-modality effects, we created a novel method of measuring sensory noise based on response variability in unisensory trials. Importantly, this method expands on previous models, allowing us to investigate the effects of visual and auditory noise independently from one another. Consistent with other computational models of multisensory speech integration, the overwhelming majority of participants had higher visual noise compared to auditory (Massaro, 1999;Ma et al., 2009;Beauchamp, 2015, 2017;Magnotti et al., 2020). Additionally, our combined sensory noise measure, which is the direct equivalent of the sensory noise ratio in the CIMS model (Magnotti and Beauchamp, 2017;Magnotti et al., 2020), was a better predictor of baseline McGurk reports than visual or auditory noise alone. These findings are strongly aligned with other computational measures of sensory noise and lend evidence to the overall importance of sensory noise for multisensory integration. The novel method of estimating sensory noise separately for each modality provides additional functionality to current models of multisensory speech integration which primarily rely on the relative levels of visual and auditory noise but do not permit either to vary independently (Magnotti and Beauchamp, 2017). These withinmodality measures of sensory noise allowed us to identify that changes in visual noise, specifically, were associated with attentional modulations to multisensory speech perception. Increases in visual load led to increased visual noise and decreased McGurk perception. Correspondingly, changes in visual noise were predictive of changes to McGurk reports across load. These findings suggest that attention alters the encoding of visual speech information and that attention may impact sensory noise in a modality-specific manner. Unfortunately, our method of calculating sensory noise resulted in many participants having an auditory noise value of zero even under high perceptual load, suggesting that this method may not be sensitive enough to estimate very low levels of sensory noise. However, it can accurately determine the individual contributions of and changes to auditory and visual noise on multisensory integration. Our results strongly indicate that modulations of attention differentially impact multisensory speech perception depending on the sensory modality of the attentional manipulation. While we found striking increases in visual noise with increasing visual load, we did not find corresponding increases in auditory noise with increasing auditory load suggesting a separate mechanism by which auditory attention influences multisensory speech integration. Additionally, while increasing perceptual load led to decreased McGurk reports for both visual and auditory secondary tasks, this effect was more pronounced for the visual task suggesting that alterations to visual attention may have a heightened impact on multisensory speech integration. Because the auditory and visual secondary tasks differed in ways other than their modality, we cannot eliminate the possibility that these differences account for our observed modality effects. We hypothesize that our visual secondary task engages featural attention, and although our secondary auditory task asked participants to identify auditory features (i.e., pitch and duration), we suspect that participants listened for melodic or rhythmic indicators of targets which may have engaged object-based attention. Future research is needed to investigate the relative contributions of distractor modality and type of attentional manipulation on multisensory speech integration. Another potential explanation for distractor modality effects is differential patterns of eye movements. Gaze behavior has been shown to influence the McGurk effect (Paré et al., 2003;Gurler et al., 2015;Jensen et al., 2018;Wahn et al., 2021). Because eye movements were not monitored during this study, future research is needed to investigate whether gaze behavior may explain modality differences in the impact of the secondary task on multisensory speech integration. Surprisingly, McGurk reports differed in the distractor-free condition across auditory vs. visual secondary task groups even though the tasks were identical. This suggests that the sensory modality of a secondary task may influence multisensory speech perception even when not concurrently presented. Approximately 70% of participants completed the distractor-free block after a low load or high load block, suggesting that our secondary task may prime attention to its corresponding modality and subsequently alter speech integration. Interestingly, we did not find differences in sensory noise across distractor modality in the distractor-free condition. This implies that any task context effects may lead to changes in participants' priors or relative weighing of auditory vs. visual speech information (Shams et al., 2005;Kayser and Shams, 2015;Magnotti and Beauchamp, 2017;Magnotti et al., 2020). The current study was not designed to assess order effects; thus, future research is needed to fully investigate modality-specific priming effects and to elucidate the mechanisms by which they may influence multisensory speech perception.
The results of this study inform our understanding of the mechanisms by which attention influences multisensory processing. Multisensory speech integration relies on both extensive processing of the auditory and visual speech signal and convergence of auditory and visual pathways onto multisensory cortical sites such as the Superior Temporal Sulcus (STS) (Beauchamp et al., 2004(Beauchamp et al., , 2010Callan et al., 2004;Beauchamp, 2011, 2012;O'Sullivan et al., 2019O'Sullivan et al., , 2021Ahmed et al., 2021;Nidiffer et al., 2021). Additionally, the functional connectivity between STS and unisensory cortices differs according to the reliability of the corresponding unisensory information (e.g., increased visual reliability will lead to increased functional connectivity between visual cortex and STS) (Nath and Beauchamp, 2011). Our findings suggest that increasing visual load leads to disrupted encoding of the visual speech signal which then leads to a deweighting of visual information potentially through decreased functional connectivity between the STS and visual cortex. Interestingly, increasing auditory load does not appear to disrupt multisensory speech integration through the same mechanism. Ahmed et al. found that attention favors integration at later stages of speech processing (Ahmed et al., 2021) suggesting that our secondary auditory task may disrupt later stages of integrative processing. Future research utilizing neuroimaging methodology is needed to link behavioral estimates of sensory noise to specific neural mechanisms.
Identifying the specific neural mechanisms by which top-down cognitive factors shape multisensory processing is important for our understanding of how multisensory integration functions in realistic contexts and across individual differences. For example, older adults exhibit either intact, enhanced, or shifted patterns of multisensory integration depending on the task utilized in the study (Hugenschmidt et al., 2009;Freiherr et al., 2013;de Dieuleveult et al., 2017;Parker and Robinson, 2018). Interestingly, several studies have shown altered sensory dominance and weighting of unisensory information in older adults when compared to younger adults (Murray et al., 2018;Jones and Noppeney, 2021). Withinmodality measures of sensory noise as described in this study may help to illuminate the reasons why certain multisensory stimuli and tasks lead to differences in the multisensory effects observed in the aging population. Cognitive control mechanisms are also known to decline with healthy aging, and manipulations of attention (e.g., dual-task designs) consistently have a larger impact on the elderly (Mahoney et al., 2012;Carr et al., 2019;Ward et al., 2021). Currently, there is a gap in knowledge on how attention may alter relative sensory weighting in older adults that could be addressed by utilizing the experimental design described in this study. Addressing this gap in knowledge could improve our understanding of multisensory speech integration in normal aging and with sensory loss (Peter et al., 2019;Dias et al., 2021) as well as current multisensory screening tools for assessing risks for falls in the elderly (Mahoney et al., 2019;Zhang et al., 2020). In addition to healthy aging, many developmental disorders are characterized by disruptions to both multisensory functioning and attention, and these neurological processes may interact to worsen the severity of these disorders (Belmonte and Yurgelun-Todd, 2003;de Jong et al., 2010;Kwakye et al., 2011;Magnée et al., 2011;Harrar et al., 2014;Krause, 2015;Mayer et al., 2015;Noel et al., 2018b). Previous research indicates that deficits in processing both speech (van Laarhoven et al., 2019) and non-speech (Leekam et al., 2007) stimuli were present in subjects on the autism spectrum. Sensory noise and its interactions with attention may contribute to differences in ASD sensory processing beyond stimulus signal-to-noise ratio or general neural noise. Investigating these mechanisms may help us understand and identify disruptions in the relationship between multisensory integration and attention, inspiring new strategies for interventions to address altered functioning in these disorders.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by the Oberlin College Institutional Review Board. The patients/participants provided their written informed consent to participate in this study.

Author contributions
LK, VF, CD, EP, WK, and CN contributed to the conception and design of the project and wrote the manuscript. LK and CD collected the data for the project. LK, VF, and CN developed the sensory noise analysis, and analyzed and interpreted the final data for the manuscript. CD, EP, and WK contributed to initial data analysis and interpretation. All authors contributed to the article and approved the submitted version.
Aligbe to some of the data collection for this study. We would also like to thank the Oberlin College Research Fellowship and the Oberlin College Office of Foundation, Government, and Corporate Grants for their support of this study.

Conflict of interest
CD is a current employee of F. Hoffman-La Roche Ltd. and Genentech, Inc. CD's contributions to this study were completed while she attended Oberlin College.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnhum.

SUPPLEMENTARY FIGURE 3
McGurk fused reports for no load (NL), low load (LL), and high load (HL). The percent of fused reports ("da" or "tha") during each block are shown for auditory distractor (A) and visual distractor (B) tasks. Horizontal bars indicate group averages. Colored lines connect individual percent fused reports across each block. Green lines indicate increased in fused reports and a red line indicates a decrease in fused reports. An RMANOVA revealed that both perceptual load (F 2,256 = 22.5, p = 9.90 × 10 −10 , partial η 2 = 0.148) and the interaction between load and distractor modality (F 2,256 = 4.7, p = 0.010, partial η 2 = 0.035) significantly altered percent McGurk reports.