Statistical context learning in tactile search: Crossmodally redundant, visuo-tactile contexts fail to enhance contextual cueing

In search tasks, reaction times become faster when the target is repeatedly encountered at a fixed position within a consistent spatial arrangement of distractor items, compared to random arrangements. Such “contextual cueing” is also obtained when the predictive distractor context is provided by a non-target modality. Thus, in tactile search, finding a target defined by a deviant vibro-tactile pattern (delivered to one fingertip) from the patterns at other, distractor (fingertip) locations is facilitated not only when the configuration of tactile distractors is predictive of the target location, but also when a configuration of (collocated) visual distractors is predictive—where intramodal-tactile cueing is mediated by a somatotopic and crossmodal-visuotactile cueing by a spatiotopic reference frame. This raises the question of whether redundant multisensory, tactile-plus-visual contexts would enhance contextual cueing of tactile search over and above the level attained by unisensory contexts alone. To address this, we implemented a tactile search task in which, in 50% of the trials in a “multisensory” phase, the tactile target location was predicted by both the tactile and the visual distractor context; in the other 50%, as well as a “unisensory” phase, the target location was solely predicted by the tactile context. We observed no redundancy gains by multisensory-visuotactile contexts, compared to unisensory-tactile contexts. This argues that the reference frame for contextual learning is determined by the task-critical modality (somatotopic coordinates for tactile search). And whether redundant predictive contexts from another modality (vision) can enhance contextual cueing depends on the availability of the corresponding spatial (spatiotopic-visual to somatotopic-tactile) remapping routines.

In search tasks, reaction times become faster when the target is repeatedly encountered at a fixed position within a consistent spatial arrangement of distractor items, compared to random arrangements. Such "contextual cueing" is also obtained when the predictive distractor context is provided by a non-target modality. Thus, in tactile search, finding a target defined by a deviant vibro-tactile pattern (delivered to one fingertip) from the patterns at other, distractor (fingertip) locations is facilitated not only when the configuration of tactile distractors is predictive of the target location, but also when a configuration of (collocated) visual distractors is predictive-where intramodal-tactile cueing is mediated by a somatotopic and crossmodal-visuotactile cueing by a spatiotopic reference frame. This raises the question of whether redundant multisensory, tactile-plus-visual contexts would enhance contextual cueing of tactile search over and above the level attained by unisensory contexts alone. To address this, we implemented a tactile search task in which, in % of the trials in a "multisensory" phase, the tactile target location was predicted by both the tactile and the visual distractor context; in the other %, as well as a "unisensory" phase, the target location was solely predicted by the tactile context. We observed no redundancy gains by multisensory-visuotactile contexts, compared to unisensory-tactile contexts. This argues that the reference frame for contextual learning is determined by the task-critical modality (somatotopic coordinates for tactile search). And whether redundant predictive contexts from another modality (vision) can enhance contextual cueing depends on the availability of the corresponding spatial (spatiotopic-visual to somatotopic-tactile) remapping routines. KEYWORDS tactile search, contextual cueing e ect, remapping, multisensory learning, crossmodal plasticity . Introduction

. . Contextual cueing in the individual modalities of vision and touch
Attention is guided by a number of separable mechanisms that can be categorized as bottom-up driven-such as guidance by salient physical properties of the current stimuli-or top-down controlled-such as guidance by observers' "online" knowledge about (search-) critical object properties (Wolfe and Horowitz, 2017). These processes are augmented by the . /fcogn. . automatic extraction of statistical co-occurrences of objects in the environment, rendering attention-guiding spatial long-term (LT) memories. For instance, repeatedly encountering a searchedfor target item at a particular location within a visual scene of consistently arranged distractor items leads to the formation of LT relational distractor-target memories, that, upon being activated by the currently viewed search display, (relatively) efficiently direct attentional scanning toward the target location (Goujon et al., 2015;Sisk et al., 2019). This effect was first described by Chun and Jiang (1998), who, in their seminal study, had participants search for a target letter "T" (left-or right-rotated) among a set of (orthogonally rotated) distractor letters "L". In half of the trials, the spatial arrangements of the distractor and target stimuli were repeated, permitting participants to learn the invariant distractor-target relations to guide their search (repeated/predictive displays). In the other half, the distractors were distributed randomly on each trial, rendering their arrangement non-predictive of the target's position in the search array (nonrepeated/non-predictive displays). Chun and Jiang's (1998) critical finding was that the reaction times (RTs) taken to find and respond to the target were faster for repeated vs. non-repeated display arrangements or "contexts". This effect referred to as "contextual cueing", subsequently was confirmed and elaborated in a plethora of studies using behavioral, computational, and neuroscientific measures (Chun and Jiang, 1999;Chun, 2000;Shi et al., 2013;Zinchenko et al., 2020;Chen et al., 2021a). In the first instance, of course, effective contextual cueing requires successful retrieval of the respective (search-guiding) LT-memory representation. Thus, for example, when the time for which the spatial distractortarget layout can be viewed is limited  or when encoding of the display layout is hampered by competing visual task demands (Manginelli et al., 2013), the retrieval of acquired context memories may be prevented, abolishing contextual facilitation. Interestingly, contextual cueing is not limited to the visual modality: tactile predictive contexts can facilitate search, too. For instance, Assumpção et al. (2015) showed that people can become better at finding an odd-one-out vibrotactile target within arrays of repeated vs. non-repeated (homogeneous) vibrotactile distractor items delivered to participants' fingertips (where the vibrotactile distractor-target arrangements consisted of two stimulated fingers, excepting the thumb, on each hand). As revealed by postural manipulations of the hands (Assumpção et al., 2018), tactile contextual cueing is rooted in a somatotopic reference frame: spatial target-distractor associations acquired during training transfer to a test phase (with crossed or flipped hands) only if the target and distractors are located at the same fingers, rather than the same external spatial locations. This finding implies that search in repeated vs. non-repeated tactile distractortarget arrangements evokes, in default mode, a somatosensory reference frame, which is different from (default) spatiotopic encoding of distractor-target relations in visual search (Chua and Chun, 2003). However, while the learning of statistical co-occurrences of target and distractor items is bound to the currently task-relevant sensory modality, the brain has the ability to adapt and reorganize connectivity between different sensory modalities in response to consistent changes in input or experience-referred to as "crossmodal plasticity" (Bavelier and Neville, 2002;Nava and Röder, 2011). Thus, an interesting question arises, namely, whether the encoding of statistical regularities in one modality would facilitate search in another modality through the engagement of crossmodal-plasticity mechanisms. For instance, given that optimal task performance may depend on the use of all available sources of information, spatial learning in the tactile modality might be enhanced by congruent, redundantsignal information in the visual modality (Ho et al., 2009), and this may involve changes in the strength (and number) of connections between neurons in the visual and somatosensory regions of the brain. The possibility of such crossmodal spatial regularity/contextual learning is the question at issue in the current study.

. . Crossmodal contextual cueing across visual and tactile modalities
Initial evidence indicates that the mechanisms underlying contextual cueing may support the functional reorganization of one sensory modality following statistical learning in another modality. For example, Kawahara (2007) presented participants with meaningless speech sounds followed by a visual search display during a training phase. The location of the search was predictable from the preceding auditory stimulus. In the subsequent test phase, this auditory-visual association was either removed for one (inconsistent-transfer) group or maintained for another (consistent-transfer) group. The results revealed the search RTs to be increased for the inconsistent-transfer group but decreased for the consistent group-suggesting that visual attention can be guided implicitly by crossmodal association. In another study, Nabeta et al. (2003) had their participants first search for a T-type target among L-type distractors visually in a learning phase, which was followed by a test phase in which they had to search haptically for T-vs. L-shaped letters. The haptic search arrays (which were carved on wooden boards and covered by an opaque curtain) were arranged in the same or different configurations compared to the visual displays during initial visual learning. Nabeta et al. (2003) found that targetdistractor contexts learned during visual search also facilitated haptic search in the absence of visual guidance. It should be noted, though, that Nabeta et al.'s haptic search involved active exploration, involving serial finger movements to sense the local items. Haptic search may thus have required participants to set up and continually update a visual working-memory representation of the scene layout, and the initially learned contexts may have come to interact with this representation, guiding the haptic exploration toward the target location. However, this would not work with tactile search scenarios involving spatially parallel, passive sensing, such as those explored by Assumpção et al. (2015Assumpção et al. ( , 2018. Passive tactile sensing and active manual exploration have been shown to involve distinct processes (Lederman and Klatzky, 2009). Accordingly, being based on active exploration, the findings of Nabeta et al. (2003) provide no clear answer as to whether and how target-distractor contexts Frontiers in Cognition frontiersin.org . /fcogn. .

FIGURE
Illustration of the experimental set-up. As illustrated in Panel (A), the height di erence between the visual and tactile presentation planes was some cm. Visual stimuli were presented on a white canvas surface tilted about • toward the observer. The viewing distance was cm. Participants placed their fingers (except the thumbs) on the eight solenoid actuators and responded to the identity of the tactile singleton target via a designated foot pedal. Panel (B) depicts the visual-tactile stimuli in a tactile search task. The search display consisted of one tactile target (the dark "spark") with seven homogenous distractor vibrations (light gray circles), accompanied by a configuration of four distractor Gabor patches (and four empty circles). The locations of the tactile target for the tactile search and Gabor patches for the visual search varied depending on whether the displays were repeated or not. In the real setting, the hands were placed on the plane below the visual plane, as illustrated in Panel (A). Panel (C) depicts the waveforms of the two possible tactile targets in a tactile search task. The upper panel depicts the waveform of target (T ): a -Hz square wave with a % duty cycle delivered via -Hz vibrations. The lower panel illustrates the waveform of target (T ): a -Hz square wave with an average % duty cycle, also composed of -Hz vibrations. The distractors were constant vibrations of Hz. Panel (D) depicts the visual-tactile stimuli in a visual search task. One visual target was embedded among seven homogenous distractors, with a configuration of four vibrotactile stimulations delivered to two (selected) fingers (gray circles) of each hand. acquired during visual search would transfer to parallel, passive tactile search.
Recently, Chen et al. (2021b) aimed to directly address this question by adopting a similar tactile-search paradigm to Assumpção et al. (2015Assumpção et al. ( , 2018, delivering vibrotactile stimulation to participants' fingertips instead of requiring active manual exploration. In addition, the visual search displays, projected on a white canvas on the top of the tactile array, were collocated with the tactile stimuli ( Figures 1A, B). The visuotactile search arrays were constructed in such a way that only the visual configuration was predictive of the tactile target location ( Figure 1C). Chen et al. (2021b) found that repeated visual contexts came to facilitate tactile search as the experiment progressed, but only if the tactile items were presented some 250-450 ms prior to the visual elements ( Figure 2A). Chen et al. (2021b) attributed this tactile preview time to the need to recode the (somatotopically-sensed) tactile array in a visual reference frame, in order for a search to benefit from the predictive context provided by the visual distractor elements (sensed in spatiotopic format).
Using a similar visual-tactile setup ( Figure 1A), Chen et al. (2020) investigated whether a predictive tactile context could facilitate visual search. Participants had to search for a visual oddone-out target, a Gabor patch differing in orientation (clockwise or counter-clockwise) from seven homogeneous vertical Gabor distractors (see Figure 1D). Critically, unbeknown to participants, visual targets were paired with repeated tactile contexts in half of the trials, and with newly generated tactile contexts in the other half. Again, the visual-tactile display onset asynchrony was varied. Similar to Chen et al. (2021b), the repeated tactile context had to be presented before the visual target in order for crossmodal contextual cueing to manifest-again suggesting that a preview time was required for remapping the somatotopically encoded tactile context into the visual spatiotopic reference frame in which the target is encoded. Of note, in a control experiment, Chen et al. (2020) found that under conditions in which participants flipped their hands, but the visual target and tactile distractors were kept unchanged with respect to somatotopic coordinates, the crossmodal contextualcueing effect was diminished ( Figure 2B). This supports the idea that, with multisensory presentations, the predictive tactile context was remapped into a spatiotopically organized visuali.e., target-appropriate-format (Kennett et al., 2002;Kitazawa, . /fcogn. . 2002; see also Azañón and Soto-Faraco, 2008;Heed et al., 2015). But is the remapping process still helpful when predictive contexts are concurrently available in two sensory modalities? Recently, Chen et al. (2021a) investigated this issue by presenting redundant visual-tactile contexts intermixed with single visual contexts in a visual search task. Following Chen et al. (2020), the tactile context was presented 450 ms prior to the visual context to promote tactile-to-visual remapping. Interestingly, Chen et al. (2021a) found that contextual facilitation of search was increased with multisensory, i.e., visuotactile, contexts relative to predictive visual contexts alone-suggesting that multisensory experiences facilitate unisensory learning.
Taken together, previous studies (Chen et al., 2020(Chen et al., , 2021a investigating visual and tactile search in multisensory arrays consisting of visual and tactile items established that contextual cues available in one-distractor-modality can be utilized in the other-target-modality. Further, redundant contexts consisting of identically positioned visual and tactile elements can enhance visual learning of the relational position of the visual target item over and above that deriving from predictive visual contexts alone.

. . ERP evidence on crossmodal contextual cueing
Evidence for crossmodal cueing comes also from recent electrophysiological studies (Chen et al., 2022a,b). For example, when using the crossmodal search paradigm sketched in Figure 1, Chen et al. (2022a) found that in a tactile search task, facilitation of search RTs by repeated visual contexts was also seen in well-established electrophysiological markers of the allocation of visuospatial attention, in particular, the N2pc (Luck et al., 2000) and CDA (Töllner et al., 2013) measured at parietalposterior ("visual") electrodes; however, the lateralized eventrelated potentials (ERPs) in the respective time windows were less marked at central ("somatosensory") electrodes ( Figure 2C). In contrast, statistical learning of the unimodal (tactile) context led to enhanced attention allocation (indexed by the N1/N2cc/CDA) at central ("somatosensory") electrodes, whereas these effects were less prominent at posterior ("visual") electrodes. These findings indicate that both somatosensory and visual cortical regions contribute to contextual cueing of tactile search, but their involvement is differentially weighted depending on the sensory modality that contains the predictive context information. There is a stronger reliance on or weighting of, either a visual or somatotopic .
/fcogn. . coordinate frame depending on the currently available sensory regularities that support contextual cueing in tactile-visual search environments. Worth mentioning is also the work of Chen et al. (2021b), who observed that crossmodal (tactile) context learning in a visual search resulted in enhanced amplitudes (and reduced latencies) of the lateralized N2pc/CDA waveforms at posterior ("visual") electrodes (see Figure 2D); both components correlated positively with the RT facilitation. These effects were comparable to the unimodal (visual context) cueing conditions. In contrast, motor-related processes indexed by the response-locked LRP at central ("somatosensory") electrodes contributed little to the RT effects. This pattern indicates that the crossmodal-tactile context is encoded in a visual format for guiding visual search.

. . Goals of the current study
The studies reviewed thus far show that search is not "ahistoric". Rather, LT-memory representations about the searchedfor target's relational position within a repeatedly encountered distractor context are accumulated across trials, and then expedite behavioral RTs and enhance lateralized ERP markersboth reflecting the more effective allocation of attention in repeated search displays. Importantly, statistical LT memories can be established in a crossmodal fashion, enabling re-occurring distractor configurations in one sensory modality (e.g., touch) to facilitate search in another (e.g., visual) modality. Theoretically, there are at least two principal accounts for this. One possibility is that crossmodal adaptation processes are set by the sensory modality that is dominant in a given performance function. Accordingly, given that spatial judgments are the province of the visual modality, items from non-visual modalities will be remapped into the coordinate system of this modality in spatial learning tasks (hypothesis 1). An alternative possibility is that functional reorganization of modalities is contingent on the modality that is relevant to the task at hand, i.e., the modality in which the target is defined (hypothesis 2). Critically, these two possibilities would make the opposite predictions regarding measurable indices of crossmodal learning in a tactile search task with redundant-i.e., both tactile and visual-distractor items presented in consistent (and thus learnable) configurations throughout performance of the task (see below for details). Hypothesis 1 would predict the remapping of the tactile items into a visual format and, thus, crossmodal facilitation of unisensory learning. In contrast, hypothesis 2 would predict no or at best a minimal benefit deriving from the presence of additional visual-predictive distractors alongside the tactile predictive distractors in a tactile search task. To decide between these alternatives, the present study implemented a tactile search task in which the visual as well as the tactile context were predictive of the target location (on multisensory trials), in order to investigate what context would be learned and in which modality-specific coordinate system the context would be encoded and retrieved to facilitate performance.
In more detail, we conducted two experiments (differing only in the stimulus-onset asynchrony, SOA, between the visual and tactile contexts) to examine the impact of multisensory, visuotactile (relative to unisensory, tactile-only) contexts on contextual facilitation learning in a tactile search task. Adopting a well-established, and demonstrably successful, multisensory learning protocol (Seitz et al., 2006;Kim et al., 2008;Shams et al., 2011;Chen et al., 2021a), observers had to search for and respond to a tactile odd-one-out target item appearing together, in a configuration, with three homogeneous tactile distractor items (see Figure 3). In 50% of the trials, the target-distractor configuration was fixed, i.e., the target appeared at a fixed location relative to the consistent distractor context (there were four such predictive, i.e., learnable contexts); in the other 50%, while the target position was also fixed, the locations of the distractors were randomly generated anew on each trial (there was the same number of such non-predictive contexts). Introducing this basic set-up, we tested contextual cueing in two separate, pure unisensory and mixed, uni-plus multisensory, phases. In the unisensory phase, the search was performed under the pure (unisensory) tactile task conditions just described; in contrast, in the mixed, uni-plus multisensory phase, trials with unisensory tactile stimulus arrays were presented randomly intermixed with trials with multisensory visuotactile contexts (the random mixing of trials ensured that participants adopted a consistent set to search for a tactile target). On the latter, visuotactile trials, the visual stimuli consisted of a configuration of three uniform distractor Gabor patches and one odd-one-out target Gabor patch, which were collocated with the positions of the tactile distractor and target stimuli. It is important to note that, in visuotactile studies of contextual cueing, the visual and tactile stimuli need to be collocated-which necessarily limits the number of (collocated) stimuli in the display. Nevertheless, previous work from our group has consistently shown reliable cueing effects using this multi-modal set-up (Chen et al., 2020(Chen et al., , 2021b(Chen et al., , 2022a, as well as with easy, "pop-out" visual search tasks (Geyer et al., 2010;Harris and Remington, 2017). Thus, by comparing contextual facilitation of RTs in tactile search with redundant, i.e., visual and tactile, distractor contexts vs. single, i.e., tactile-only, distractor contexts, we aimed to decide between the two alternative accounts (outlined above) of crossmodal contextual cueing in search tasks.
. Materials and methods

. . Participants
Twenty-eight university students were recruited, and randomly assigned to Experiment 1 (14 participants; six males; M = 27.4 years, SD = 5.1 years) and Experiment 2 (14 participants; eight males; M = 25.8 years, SD = 3.95 years); they were all right-handed, had normal or corrected-to-normal vision, and reported normal tactile sensation. The sample sizes were determined by a-priori power analysis based on (effect size) d z = 0.81 for a facilitatory effect of multisensory statistical learning in a similar study of multisensory context cueing (Chen et al., 2021a). According to the power estimates computed with G * Power (Erdfelder et al., 1996), a minimum sample size of 13 participants was required (with α = 0.05, and power = 0.85). All participants provided written informed consent before the experiment and were paid 9.00 Euro per hour for their services. The study was approved by the Ethics Committee of the LMU Munich Faculty of Psychology and Pedagogics.

FIGURE (A)
An example stimulus sequence of a multisensory-visuotactile trial in the mixed uni-and multisensory phase of Experiment . After the initial auditory beep and fixation marker, tactile stimuli were presented for ms prior to the onset of the visual items. In Experiment , the visual display was presented ms earlier than the tactile display. The dark "star" represents the tactile singleton (target) finger, and the light gray disks the non-singleton (distractor) fingers. The four visual items were Gabor patches presented at, relative to the stimulated fingers, corresponding locations. The visual target was the single left-vs. right-tilted Gabor patch, among the three vertical distractor Gabor patches. Observers' task was to discriminate the tactile target-frequency pattern (T vs. T ) by pressing the corresponding foot pedal. The maximum stimulus duration was s. A feedback display was presented after the response. (B) On unisensory/multisensory-tactile trials, only tactile stimuli were presented.

. . Apparatus and stimuli
Both experiments were conducted in a sound-attenuated testing chamber, dimly lit by indirect incandescent lighting, with a Windows computer using Matlab routines and Psychophysics Toolbox extensions (Brainard, 1997;Pelli, 1997). The tactile and visual items were presented at spatially corresponding locations at vertically offset (i.e., a lower, tactile and an upper, visual) presentation planes ( Figure 1A). Visual stimuli (and task instructions/ feedback) were projected onto a white canvas in front of the participant, using an Optoma projector (HD131Xe; screen resolution: 1,024 × 720 pixels; refresh rate: 60 Hz), mounted on the ceiling of the experimental booth. The canvas was fixed on a wooden frame and tilted about 20 • toward the observer. The viewing distance was fixed at about 60 cm with a chin rest. Responses were recorded using foot pedals (Heijo Research Electronics, UK).
Participants placed their eight fingers (except the thumbs) on eight solenoid actuators (each of a diameter of 1.8 cm, with a distance of 2 cm between adjacent actuators; see also Assumpção et al., 2015;Chen et al., 2020). The actuators activated lodged metal tips, vibrating a pin by 2-3 mm upon the magnetization of . /fcogn. . the solenoid coils, controlled by a 10-Channel Amplifier (Dancer Design) connected to the computer with a MOTU analog-output card. Four vibro-tactile stimulations were presented to four fingers, two of each hand, with the tactile target delivered to one finger and tactile distractors to the three other fingers. Each distractor was a constant 150-Hz vibration, while the target was one of the following vibration patterns (see Figure 1C): target 1 (T1) was a 5-Hz square wave with 30% duty cycle, composed of 150-Hz vibrations, and target 2 (T2) a 5-Hz square wave with an average 60% duty cycle, also made up of 150-Hz vibrations. To make T2 distinguishable from T1, a blank gap of 200 ms was inserted between every two cycles in T2 (the mean frequency of T2 was thus 3.3 Hz). Visual stimuli in multisensory, visuo-tactile, trials consisted of four (striped black and white; Michelson contrast of 0.96, spatial frequency of 2 cpd) and four empty circles, each subtending 1.8 • of visual angle, presented on a gray background (36.4 cd/m 2 ). Of the four Gabor patches, one patch was an odd-one-out orientation item, deviating by +30 • or −30 • from the vertical: the visual "target"; and the other three were orientation-homogeneous, vertical visual "distractor" patches (see also Chen et al., 2021a). The visual Gabor and empty-circle items were presented at the eight "virtual" (i.e., collocated) finger positions on the upper display plane, with a distance of about 1.9 • of visual angle between adjacent items. The "target" and "distractor" Gabor positions exactly matched the vibro-tactile target and distractor stimuli, i.e.,: the response-relevant tactile target position was signaled redundantly by a collocated visual Gabor singleton. Importantly, cross-modally redundant target-location signaling was realized with both predictive and non-predictive distractor contexts. This also applied to the pairing of a particular vibro-tactile target (T1 or T2) with a particular visual Gabor orientation (+30 • or −30 • ); this pairing was fixed for a particular participant (and counterbalanced across participants). Keeping these conditions the same with both predictive and non-predictive distractor contexts was designed to rule out any potential influences of space-and identity/responsebased crossmodal correspondences (e.g., Spence and Deroy, 2013) on the dependent measure: contextual facilitation. During task performance, participants wore headphones (Philips SHL4000, 30mm speaker drive) delivering white noise (65 dBA) to mask the (otherwise audible) sound produced by the tactile vibrations. The white noise started and stopped together with the vibrations.

. . Procedure
Experiments 1 and 2 only differed in the stimulus-onset asynchrony (SOA) between the visual and tactile displays. In Experiment 1, the tactile display was presented 350 ms prior to the visual display, similar to the setting in our previous work (Chen et al., 2020(Chen et al., , 2021a. Conversely, in Experiment 2, the visual display To avoid potential response incompatibility, the response to the tactile target T or T (either left or right) was mapped to the orientation of the tactile-matched visual singleton (left or right), across phases. That is, if T was paired to the left-tilted Gabor patch in the mixed, uni-plus multisensory phase for a given participant, the target T was assigned to the left key, and T to the right key for that participant.
was presented 200 ms before the tactile display. A pilot experiment (run in preparation for the current study) with the unisensory visual displays showed that the 200-ms presentation of the displays was sufficient to produce a contextual cueing effect, with four repeated target-distractor configurations. This is consistent with a previous study of ours (Xie et al., 2020), which demonstrated a contextual cueing effect with a 300-ms presentation, even though the search displays were more complex (consisting of 1 Tshaped target among 11 L-shaped distractors) and there were 12 repeated target-distractor configurations. Moreover, evidence from neuro-/electrophysiological studies indicate that the allocation of spatial attention diverges as early as 100-200 ms post-display onset between repeated and novel target-distractor configurations (e.g., Johnson et al., 2007;Chaumon et al., 2008;Schankin and Schubö, 2009).

. . . Practice tasks
Participants first practiced the response mapping of the foot (i.e., response) pedals to the tactile targets (T1 or T2). The target-pedal assignment was fixed for each participant but counterbalanced across participants. The practice phase consisted of four tasks: (1) tactile target identification (32 trials); (2) tactile search (32 trials); (3) visual search (32 trials); and (4) multisensory search (64 trials, half of which presented only tactile targets and the other half redundantly defined, visuotactile targets). Participants had to reach a response accuracy of 85% in a given task before proceeding to the next task (all participants achieved this criterion with one round of training).
In the tactile target-identification task, one vibrotactile target (either T1 or T2 lasting 6 s) was randomly delivered to one of the eight fingers. Participants had to respond, as quickly and accurately, as possible by pressing the corresponding foot pedal to discriminate the tactile target. During this task, the tactile array was always accompanied by the correct target label, "T1" or "T2", on the screen, to aid identification of the tactile target (T1 vs. T2) and mapping it onto the required (left vs. right foot-pedal) response. In the tactile-search task, four vibrotactile stimuli, one target and three distractors, were delivered to two fingers of each hand. Participants had to identify T1 or T2 as quickly and accurately as possible by pressing the associated foot pedal. Given the experimental task proper consisted of redundant visuo-tactile displays, the visualsearch practice was designed to familiarize participants with the visual target (and distractor) stimuli and, so, ensure that they would not simply be ignored on multisensory trials in the experiment proper (in which the task could be performed based on the tactile stimuli alone). In the visual search task, eight visual items (four Gabors and four empty circles) were presented on the screen. Participants were asked to identify the target Gabor orientation (tilted to the left or the right) as rapidly as possible by pressing the corresponding foot pedal.
In the practice of the search task under mixed, uni-and multisensory conditions, participants were presented with 50% unisensory tactile trials (identical to the tactile-search practice) and 50% multisensory visuotactile trials (presenting both one target and three distractors in each, the visual and the tactile, modality), randomly interleaved (see Figure 3). In Experiment 1, the visual items were presented 350 ms after the tactile stimuli; in Experiment 2, they were presented 200 ms prior to the tactile stimuli. Importantly, although the multisensory displays had two collocated targets singled out in the two sensory modalities, participants were expressly instructed to set themselves for tactile search, even though the visual stimuli (provided only on multisensory trials) could provide cues to solving for the tactile task. This instruction was meant to ensure that the "tactile" task set was identical across uniand multisensory trials -allowing us to examine for any beneficial effects of multisensory vs. tactile stimulation on statistical context learning. To reflect tactile search, the RTs were recorded from the onset of the tactile stimuli in both experiments.

. . . Experimental tasks
Immediately following the practice, each participant performed two experimental phases: a pure unisensory phase and a mixed, uni-and multisensory phase. The unisensory phase presented only tactile trials, and the mixed phase included both tactile and visuotactile trials, randomly intermixed. And the repeated targetdistractor configurations were identical for the tactile-only and visuotactile trials in the mixed phase. The trial procedure was the same as in the respective practice tasks (see Figure 3). Each trial began with a 600-Hz beep (65 dBA) for 300 ms, followed by a short fixation interval of 500 ms. A search display (tactile or visuotactile) was then presented until a foot-pedal response was made or for a maximum of 6 s. Participants were instructed to respond as quickly and accurately as possible to the tactile target. Following observers' responses, accuracy feedback with the word "correct" or "wrong" was presented in the center of the screen for 500 ms (Figure 3). After an inter-trial interval of 1,000 ms, the next trial began. Eight consecutive trials constituted one trial block, consisting of the presentation of each of the four predictive display configurations plus four non-predictive configurations, in randomized order. After every two blocks, double beeps (2 × 200 ms, 1,000 Hz, 72 dBA, separated by an 800 ms silent interval) cued the accuracy feedback, with the mean accuracy attained in the previous two blocks shown in the center of the screen for 1,000 ms.
Half of the participants started with the pure unisensory phase and the other half with the mixed uni-and multisensory phase; each phase consisted of the same number of trials [Experiment 1: 256 trials per phase, with 128 repetitions per (repeated/nonrepeated display) condition for the pure unisensory phase, and 64 repetitions per condition for the mixed, uni-plus multisensory phase; Experiment 2: 320 trials per phase with 160 repetitions per condition for the pure unisensory phase and 80 repetitions per condition for the mixed phase], to equivalent numbers of trials with tactile information between the pure unisensory and the mixed, uni-and multisensory phase. We increased the number of trials in Experiment 2 in order to extend the opportunity for contextual learning, i.e.,: would the enhanced contextual facilitation by multisensory information become observable with more trialsi.e., repetitions of each predictive display arrangement-per Of course, this was also the only set permitting the task to be performed consistently, without set switching, on both types of-randomly interleavedtrials.
learning "epoch"? Recall that each of the four predictive display arrangements is presented once per block, intermixed with four non-predictive displays in Experiment 1. So, in Experiment 2, an experimental epoch combined data across five blocks of trials (i.e., five repetitions of each predictive display), compared with four blocks (i.e., four repetitions of each predictive display) in Experiment 1.

. . Design
To balance stimulus presentations between the left and right sides, the search arrays always consisted of two distractors on one side, and one target and one distractor on the other side. There were 144 possible displays in total to be sampled from. For the repeated contexts, we randomly generated two different sets of four displays for each participant, one set for the pure unisensory phase (hereafter Set 1) and one for the mixed uni-and multisensory phase (Set 2). Separate sets of repeated displays were generated to minimize potentially confounding transfer effects across phases. For "repeated" displays (of both sets), the target and distractor positions were fixed and repeated in each phase. For "nonrepeated" displays, by contrast, the pairing of the target location with the three distractor positions was determined randomly in each block; these displays changed across blocks, making it impossible for participants to form spatial distractor-target associations. Note, though, that target locations were repeated equally in non-repeated and repeated displays (see Figure 4). That is, in each block of four repeated and four non-repeated trials, four positions, two from each side, were used for targets in the repeated condition, and the remaining four positions (again two on each side) for non-repeated displays (we also controlled the eccentricity of the target locations to be the same, on average, for repeated and non-repeated trials; see Supplementary material for an analysis of the eccentricity effects). This was designed to ensure that any performance gains in the "repeated" conditions could only be attributed to the effects of repeated spatial distractortarget arrangements, rather than repeated target locations, in this condition (see, e.g., Chun and Jiang, 1998, for a similar approach).

. . Data analysis
Trials with errors or RTs below 200 ms or above three standard deviations from the mean were excluded from RT analysis. Mean accuracies and RTs were submitted to repeated-measures analyses of variance (ANOVAs) with the factors Modality (unisensorytactile, multisensory-tactile, multisensory-visuotactile), Display (repeated vs. non-repeated), and Epoch (1-8; one experimental epoch combining data across four consecutive trial blocks in Experiment 1 and 5 blocks in Experiment 2). Greenhouse-Geissercorrected values were reported when the sphericity assumption was violated (Mauchley's test, p < 0.05). When interactions were significant, least-significant-difference post-hoc tests were conducted for further comparisons. The contextual-cueing effect was defined as the RT difference between repeated and nonrepeated displays. We conducted one-tailed t-tests to examine the . /fcogn. .

FIGURE
Schematic illustration of the distribution of targets in repeated and non-repeated displays across search blocks. In repeated displays, the target location was constant and paired with constant distractor locations; in non-repeated displays, only the target, but not the distractor, locations were held constant across blocks.
significance of the contextual-cueing effect (i.e., testing it against zero), given contextual cueing is, by definition, a directed effect: search RTs are expected to be faster for repeated vs. non-repeated search-display layouts (Chun and Jiang, 1998). We additionally report Bayes factors (Bayes inclusion for ANOVA) for nonsignificant results to further evaluate the null hypothesis (Harold Jeffreys, 1961;Kass and Raftery, 1995).
. Results and discussion . . Accuracy The mean accuracies in Experiment 1 (in which the visual items were presented after the tactile items) were 90, 91, and 94%, for the unisensory-tactile, multisensory-tactile, and multisensoryvisuotactile conditions, respectively. A repeated-measures ANOVA revealed no significant effects, all ps > 0.31, η 2 p s < 0.09, BF incl s < 0.16.
In Experiment 2 (where the visual items were presented before the tactile items), the mean accuracies were 93, 91, and 97% for the unisensory-tactile, multisensory-tactile, and multisensoryvisuotactile conditions, respectively. A repeated-measures ANOVA revealed the main effect of Modality to be significant, F (1.3,16.94) = 7.63, p = 0.009, η 2 p = 0.37: accuracy was higher for multisensory-visuotactile trials compared to both unisensorytactile and multisensory-tactile trials (two-tailed, ps < 0.008, d z s > 0.83); there was no significant difference between the latter two conditions (p = 0.37, d z = 0.25, BF 10 = 0.39). Thus, accurately responding to the tactile target was generally enhanced by the preceding visual display (whether or not this was predictive). Further, accuracy was overall slightly higher for repeated (94.4%) vs. non-repeated (93.5%) displays, F (1,13) = 4.66, p = 0.05, η 2 p = 0.26, BF incl = 0.10, though the Bayes factor argues in favor of a null effect. No other effects were significant, all ps > 0.1, η 2 p s < 0.16, and BF incl s < 0.31.

. . RTs
Trials with extreme RTs were relatively rare: only 0.4% had to be discarded in Experiment 1 and 0.5% in Experiment 2. Figure 5A depicts the correct mean RTs for repeated and non-repeated displays as a function of Epoch, separately for the unisensory-tactile, multisensory-tactile, and multisensoryvisuotactile trials, for Experiments 1 and 2, respectively. By visual inspection, both experiments show a procedural-learning effect: a general (i.e., condition-non-specific) improvement of performance with increasing practice of the task. Importantly, in contrast to Experiment 1, there was a clear contextualcueing effect (over and above the general performance gain) in the multisensory-visuotactile as well as unisensory-tactile and multisensory-tactile search conditions (witness the differences between the corresponding solid and dashed lines) in Experiment 2; in Experiment 1, by contrast, there appeared to be no cueing effect in the multisensory-visuotactile condition. Recall, the only difference between Experiments 2 and 1 was the order in which the visual and tactile (context) stimuli were presented on multisensoryvisuotactile trials: the visual context preceded the tactile context in Experiment 2, whereas it followed the tactile context in Experiment 1.
Experiment 1 thus showed that predictive tactile contexts alone could facilitate tactile search in both the pure unisensory and mixed, uni-and multisensory phases of the experiment, whereas redundant predictive visuotactile contexts (with the visual display following the tactile array) failed to facilitate tactile search. Note that, in the mixed multisensory phase, the purely tactile and the visuotactile contexts involved exactly the same predictive tactile item configurations. Accordingly, the absence of contextual facilitation on visuotactile trials, which contrasts with the manifestation of facilitation on purely tactile trials (where the two types of trial were presented randomly interleaved), indicates that it is not the lack of contextual learning that is responsible for lack of cueing on the former trials; instead, this is likely due to retrieval of successfully learnt contexts being blocked when the visual context is presented after the tactile search array-consistent with previous findings .
The significant Modality effect is interesting: It was due to the preceding visual display generally enhancing both response speed and accuracy (see the accuracy results above). However, this effect (in both RTs and accuracy) is independent of whether the visual context is predictive or non-predictive of the target location in the tactile array, i.e., it does not impact the contextual-cueing effect (the Modality × Display interaction was non-significant). Thus, the visual display likely just acts like an additional "warning signal" (Posner, 1978) over and above the auditory beep and fixation cross at the start of the trial, boosting observers' general preparedness for processing the impending tactile array.
/fcogn. . condition engendered less (if any) contextual facilitation in Experiment 1 compared to Experiment 2. Given the analysis unit of an "Epoch" is somewhat arbitrary and, arguably, to examine for procedural learning effects, the cueing effect between the very first epoch of learning (in which participants had encountered the repeated arrangements only a few times) and the very last epoch (by which they had the maximum opportunity to acquire the contextual regularities) was compared by an ANOVA on the normalized contextual-facilitation effects with the within-subject factors Modality and Epoch (Epoch 1, Epoch 8) and the betweensubject factor Experiment. The results revealed a significant main effect of Modality, F (2, 52) = 4.87, p = 0.012, η 2 p = 0.16, and a significant main effect of Epoch (Epoch 1, Epoch 8 ), F (1,26) = 4.46, p = 0.04, η 2 p = 0.15, with a larger cueing effect in Epoch 8 than in Epoch 1 (mean difference = 4.9%). No other main effects or interaction effects were significant, all p's > 0.15, η 2 p s < 0.08, BF incl s < 0.22. This result pattern is indicative of an increased effect of the contextual learning across the experiment for all three modality conditions, in both Experiments 1 and 2.

. General discussion
The question at issue in the present study was to examine what context would be learned, and in which modality the context would be encoded and retrieved if both visual and tactile contexts are available in principle to guide tactile search. To address this, in two experiments, we compared the impact of multisensory, relative to unisensory, predictive contexts on the performance of a tactile search task. The two experiments differed in the order in which the visual and tactile contexts were presented on multisensory-visuotactile trials: the visual context followed the task-critical context in Experiment 1 but preceded it in Experiment 2. Critically, in the mixed uni-and multisensory phase of the task, we randomly intermixed tactile-only and visuotactile trials using identical predictive configurations in both trial types. Both experiments revealed reliable contextual cueing when the tactile context was shown alone, whether in a separate (unisensory) phase or randomly intermixed with visuotactile trials in the mixed (uniand multisensory) phase, replicating previous findings (Assumpção et al., 2015(Assumpção et al., , 2018. However, presenting both identically positioned visual items and the tactile target-distractor configuration together on multisensory trials did not enhance the contextual-cueing effect over and above the presentation of the tactile array alone, i.e.,: there was no redundancy gain from multisensory-visuotactile contexts. Indeed, the expression of the cueing effect was impeded when the visual display was presented after the tactile array in Experiment 1. We take the lack of a redundancy gain even under optimal conditions (with the visual display preceding the tactile array in Experiment 2) to indicate that, despite the availability of redundant, visual and tactile predictive item configurations, statistical learning of distractor-target contingencies is driven (solely) by the taskrelevant, tactile modality.
Presenting the visual display after the tactile array on multisensory trials in Experiment 1 abolished the contextualcueing effect. Given that the same predictive tactile configurations significantly facilitated tactile search on tactile-only trials in the multisensory (i.e., the mixed, uni-plus multisensory) phase of the experiment, the lack of a contextual cueing effect on multisensory trials may be owing to the (delayed) presentation of the visual display interfering with tactile-context retrieval, likely by diverting attention away from the tactile modality (see also Manginelli et al., 2013;Zang et al., 2015). Whatever the precise explanation, the differential effects between Experiments 1 and 2 agree with the hypothesis that which modality is selected for the encoding of contextual regularities is determined by the task at hand.
Recall that in the existing studies of crossmodal contextual cueing (Chen et al., 2020(Chen et al., , 2021a(Chen et al., ,b, 2022a, search was either visual (Chen et al., 2020(Chen et al., , 2021a(Chen et al., , 2022b or only visual predictive contexts were presented to inform tactile search (Chen et al., 2021b(Chen et al., , 2022a. Those studies consistently showed that learning predictive distractor contexts in one modality can facilitate search in the other, target modality while highlighting the aptness of the spatiotopic visual reference frame for crossmodal spatial learning.
A question left open by these studies was how statistical context learning develops in the presence of redundant context stimuli encoded in different reference frames in search of a tactile targetin particular, predictive visual and predictive tactile contexts sensed in spatiotopic and somatotopic frames, respectively. We take the pattern of findings revealed in the present study to provide an answer: The spatiotopic reference frame of the visual modality is not the default system for multisensory contextual learning. Rather, when the task requires a search for a tactile target, contextual memories are formed within the somatotopic frame of the tactile modality-even when the target location is redundantly predicted by both the tactile and the visual item configuration.
In the previous study, Chen et al. (2021a) had observed enhanced contextual cueing when the task-critical visual item configuration was preceded by predictive tactile contexts (vs. predictive visual contexts alone) in a visual search task. Extrapolating from this result, a multisensory contextual redundancy gain would also have been expected in the present study, at least when the visual display preceded the tactile array. Chen et al. (2021a) argued that presenting the tactile context prior to the visual context in the visual search task permitted the predictive tactile array to be remapped into spatiotopic-external coordinates, i.e., the reference system of the visual modality. Accordingly, the remapped tactile-predictive array could be combined with the visual-predictive display, enhancing visual contextual cueing over and above the level rendered by the unisensory visual context alone (Kennett et al., 2002;Heed et al., 2015;Chen et al., 2021b). By analogy, in the present study, encoding of the preceding visual configuration could conceivably have engendered visual-to-tactile remapping, thus adding to the cue provided by the task-relevant tactile arrangement to enhance contextual facilitation (based on the common somatotopic reference system). However, our results are at odds with this possibility: although the prior onset of the visual array boosted performance (accuracy and speed) in general, it did not enhance contextual cueing. A likely reason for this is an asymmetry in coordinate-frame remapping: while somatotopic ("tactile") coordinates can be efficiently remapped into spatiotopic-external ("visual") coordinates, there may be no ready routines for .
/fcogn. . remapping spatiotopic-external coordinates into somatotopic coordinates (Pouget et al., 2002;Eimer, 2004;Ernst and Bülthoff, 2004). Given this, the present findings demonstrate a limit of multisensory signal processing in contextual cueing: multisensory redundancy gains require that both the visual-predictive and the tactile-predictive contexts can be coherently represented in a reference frame that is supported by the task-critical, target modality. Our results show that predictive visual contexts fail to meet this (necessary) condition when the task requires a search for a tactile target. We acknowledge a possible limitation of the current study, namely: the fact that participants underwent only a relatively short multisensory phase of task performance. Recall that, even in this phase, the critical, multisensory-visuotactile displays occurred only on half the trials (the other half being designed to enforce a tactile task set, as well as providing a unisensory-tactile baseline condition against which to assess any multisensory-visuotactile redundancy gains). Thus, it cannot be ruled out that multisensory contextual facilitation of tactile search might be demonstrable with more extended training regimens (for indirect evidence of the facilitatory effect of consistent audio-visual training on the subsequent performance of a pure visual search task, see Zilber et al., 2014). Accordingly, with respect to the present visuotactile scenario, future work might examine whether tactile cueing of target-distractor regularities would be enhanced by concurrent visual-predictive items when an extended training schedule, perhaps coupled with a pre-/post-test design (cf. Zilber et al., 2014), is implemented.
In sum, when both visual-predictive and tactile-predictive contexts are provided in a tactile search task, the tactile context dominates contextual learning. Even giving the visual contexts a head-start does not facilitate tactile learning, likely because there are no ready routines for remapping the visual item configuration into the somatotopic coordinates underlying the tactile task. We conclude that the task-critical-i.e., targetmodality determines the reference frame for contextual learning; and whether or not redundant predictive contexts provided by another modality can be successfully exploited (to enhance contextual cueing) depends on the availability of the requisite spatial remapping routines.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://osf.io/73ejx/.

Ethics statement
The studies involving human participants were reviewed and approved by LMU Munich Faculty of Psychology and Pedagogics. The patients/participants provided their written informed consent to participate in this study.