AUTHOR=Venezia Jonathan H. , Vaden Kenneth I. , Rong Feng , Maddox Dale , Saberi Kourosh , Hickok Gregory TITLE=Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus JOURNAL=Frontiers in Human Neuroscience VOLUME=Volume 11 - 2017 YEAR=2017 URL=https://www.frontiersin.org/journals/human-neuroscience/articles/10.3389/fnhum.2017.00174 DOI=10.3389/fnhum.2017.00174 ISSN=1662-5161 ABSTRACT=The human superior temporal sulcus (STS) is responsive to visual and auditory information, including sounds and facial cues during speech recognition. We investigated the functional organization of STS with respect to modality-specific and multimodal speech representations. Twenty younger adult participants were instructed to perform an oddball detection task and were presented with auditory, visual, and audiovisual speech stimuli, as well as auditory and visual nonspeech control stimuli in a block fMRI design. Consistent with a hypothesized anterior-posterior processing gradient in STS, auditory, visual, and audiovisual stimuli produced the largest BOLD effects in anterior, posterior, and middle STS, respectively, based on whole-brain, linear mixed effects, and principle component analyses. Notably, the middle STS exhibited preferential responses to multisensory stimulation, as well as speech compared to nonspeech. Within the mid-posterior and middle STS regions, response preferences changed gradually from visual, to multisensory, to auditory moving posterior to anterior. Post-hoc analysis of visual regions in the posterior STS revealed that a single subregion bordering the middle STS was insensitive to differences in low-level motion kinematics yet distinguished between visual speech and nonspeech based on multi-voxel activation patterns. These results suggest that auditory and visual speech representations are elaborated gradually within anterior and posterior processing streams, respectively, and may be integrated within the middle STS, which is sensitive to more abstract speech information within and across presentation modalities. The spatial organization of STS is consistent with processing streams that are hypothesized to synthesize perceptual speech representations from sensory signals that provide convergent information from visual and auditory modalities.