Identifying environmental sounds: a multimodal mapping study

Our environment is full of auditory events such as warnings or hazards, and their correct recognition is essential. We explored environmental sounds (ES) recognition in a series of studies. In study 1 we performed an Activation Likelihood Estimation (ALE) meta-analysis of neuroimaging experiments addressing ES processing to delineate the network of areas consistently involved in ES processing. Areas consistently activated in the ALE meta-analysis were the STG/MTG, insula/rolandic operculum, parahippocampal gyrus and inferior frontal gyrus bilaterally. Some of these areas truly reflect ES processing, whereas others are related to design choices, e.g., type of task, type of control condition, type of stimulus. In study 2 we report on 7 neurosurgical patients with lesions involving the areas which were found to be activated by the ALE meta-analysis. We tested their ES recognition abilities and found an impairment of ES recognition. These results indicate that deficits of ES recognition do not exclusively reflect lesions to the right or to the left hemisphere but both hemispheres are involved. The most frequently lesioned area is the hippocampus/insula/STG. We made sure that any impairment in ES recognition would not be related to language problems, but reflect impaired ES processing. In study 3 we carried out an fMRI study on patients (vs. healthy controls) to investigate how the areas involved in ES might be functionally deregulated because of a lesion. The fMRI evidenced that controls activated the right IFG, the STG bilaterally and the left insula. We applied a multimodal mapping approach and found that, although the meta-analysis showed that part of the left and right STG/MTG activation during ES processing might in part be related to design choices, this area was one of the most frequently lesioned areas in our patients, thus highlighting its causal role in ES processing. We found that the ROIs we drew on the two clusters of activation found in the left and in the right STG overlapped with the lesions of at least 4 out of the 7 patients' lesions, indicating that the lack of STG activation found for patients is related to brain damage and is crucial for explaining the ES deficit.

Our environment is full of auditory events such as warnings or hazards, and their correct recognition is essential. We explored environmental sounds (ES) recognition in a series of studies. In study 1 we performed an Activation Likelihood Estimation (ALE) meta-analysis of neuroimaging experiments addressing ES processing to delineate the network of areas consistently involved in ES processing. Areas consistently activated in the ALE meta-analysis were the STG/MTG, insula/rolandic operculum, parahippocampal gyrus and inferior frontal gyrus bilaterally. Some of these areas truly reflect ES processing, whereas others are related to design choices, e.g., type of task, type of control condition, type of stimulus. In study 2 we report on 7 neurosurgical patients with lesions involving the areas which were found to be activated by the ALE meta-analysis. We tested their ES recognition abilities and found an impairment of ES recognition. These results indicate that deficits of ES recognition do not exclusively reflect lesions to the right or to the left hemisphere but both hemispheres are involved. The most frequently lesioned area is the hippocampus/insula/STG. We made sure that any impairment in ES recognition would not be related to language problems, but reflect impaired ES processing. In study 3 we carried out an fMRI study on patients (vs. healthy controls) to investigate how the areas involved in ES might be functionally deregulated because of a lesion. The fMRI evidenced that controls activated the right IFG, the STG bilaterally and the left insula. We applied a multimodal mapping approach and found that, although the meta-analysis showed that part of the left and right STG/MTG activation during ES processing might in part be related to design choices, this area was one of the most frequently lesioned areas in our patients, thus highlighting its causal role in ES processing. We found that the ROIs we drew on the two clusters of activation found in the left and in the right STG overlapped with the lesions of at least 4 out of the 7 patients' lesions, indicating that the lack of STG activation found for patients is related to brain damage and is crucial for explaining the ES deficit.
Keywords: environmental sounds, fMRI, neurosurgical patients, Activation Likelihood Estimation (ALE) meta-analysis, lesion mapping INTRODUCTION Sound recognition such as a telephone ringing or a dog barking seems such an effortless task. The ability to process environmental sounds (ES) such as warnings (e.g., a siren), threats (e.g., a rattlesnake), recognize when a device is functioning correctly (e.g., clicking of a stapler) or incorrectly (e.g., water dripping), locate an event in space (e.g., an explosion), monitor a change in status (e.g., chiming of a cuckoo clock), communicate an emotional (e.g., scream), or physical condition (e.g., a burp) (Marcell et al., 2000) is essential for everyday life.
The network of areas involved in ES processing has been investigated also by functional imaging studies (see, for example, the ES processing model by Lewis et al., 2004). In particular, activations in the posterior middle temporal gyrus (MTG) as well as areas like the inferior frontal gyrus (IFG) (Lewis et al., 2004), which is anatomically connected with the auditory cortex (Hackett et al., 1999;Romanski et al., 1999a,b;Romanski and Goldman-Rakic, 2002) have been reported. Areas of activation were found in the MTG and the precuneus bilaterally and in the posterior portion of the left IFG-activation in this area was higher for sound recognition vs. sound localization (Maeder et al., 2001). In addition, activation can be found in the insula (Sharda and Singh, 2012), an area which has numerous direct connections with the auditory cortex (Bamiou et al., 2003) and can cause auditory agnosia if lesioned bilaterally (Engelien et al., 1995). Also the parahippocampal gyri (Sharda and Singh, 2012) can be activated by sound recognition, possibly reflecting the "imageability" of ES sounds (Engel et al., 2009). Lastly, sound activations were found in various subcortical regions like the thalamus (Sharda and Singh, 2012)-which is part of the auditory pathway-as well as the caudate and putamen. Some studies suggested that activation is rather right lateralized, especially in the (non-primary) auditory cortex such as the superior temporal gyrus (STG) (Bergerbest et al., 2004) and the inferior prefrontal cortex (Bergerbest et al., 2004). A PET study (Zatorre et al., 1992) showed that cerebral blood flow (CBF) in the inferior prefrontal cortex depends on the type of cognitive operation involved in ES, see also (Specht and Reul, 2003). In a similar vein, some authors (Thierry et al., 2003) proposed that the connectivity to the left lateralized semantic network is primarily right-sided for ES and left-sided for words. Others (Specht and Reul, 2003) argued that activation in the right STG and superior temporal sulcus (STS) plays the same crucial role in the analysis of non-speech sounds as does the left STS in speech perception (Specht and Reul, 2003). Last, Dick et al. (2007) found that language and ES stimuli evoked very similar volumes of activation in their languagebased regions of interest in the left hemisphere, whereas they found greater activation for ES stimuli in the right hemisphere. Studies also showed that the areas involved in ES processing can be modulated by type of stimulus. Like semantic processing, ES recognition is characterized by category specificity: vocalizations (Fecteau et al., 2004;Lewis et al., 2009;Rauschecker and Scott, 2009;Staeren et al., 2009;Leaver and Rauschecker, 2010) and human-produced action sounds (Lewis, 2006;Lewis et al., 2006;Altmann et al., 2007) are different classes of stimuli triggering different activations. To sum up, inconsistencies as to the areas involved in the network supporting ES processing are found across studies, similarly to what emerges from the analysis of the neuropsychological data reported above.
Considering these discrepancies, the aim of our study was to investigate which nodes of the network triggered by ES processing are essential for ES processing and which areas are accessory. We thus compared neuroimaging and neuropsychological data between ES patients and normal controls. In addition, being a correlation-based method, fMRI can delineate brain networks engaged in ES processing; critical mechanisms can only be reported by studying patients with a deficit in ES following a brain damage. We first performed an Activation Likelihood Estimation (ALE) Meta-Analysis to investigate the regions which were found to be consistently activated in neuroimaging studies of ES recognition. In study 2 we selected neurosurgical patients with lesions involving these areas and tested their ES abilities. In study 3 we performed an fMRI study to understand how the key areas involved in ES recognition are functionally deregulated as compared to those of control subjects. In our patients we investigated which parts of the network involved in ES processing that were found to be activated by the meta-analysis of fMRI studies are critically involved in the task. We also tested which part of the ES processing network which was found to be activated by the meta-analysis would be deregulated in the patients' fMRI maps. We acknowledge that our sample size is relatively small, however it is known that auditory agnosia for ES is a rare neuropsychological disorder. Previously published neuropsychological reports focused mainly on single cases, with few exceptions of group studies. Furthermore, at variance with previous studies, our patient sample had selective and relatively small lesions in comparison to patients with stroke lesions which typically involve large parts of the cortex or to patients with neurodegenerative disorders which affect multiple areas.

MATERIALS AND METHODS
We performed three consecutive studies. In Study 1 we used an Activation Likelihood Estimation (ALE) Meta-Analysis to identify the areas which are consistently activated in neuroimaging studies of ES processing. In Study 2, we report on neurosurgical patients who had a lesion involving the areas revealed by the ALE Meta-Analysis and were impaired at ES recognition. In Study 3 we compared the fMRI maps of patients and healthy controls to understand how these areas might be functionally deregulated because of the lesion.

Data Used for the Meta-analysis
The functional imaging studies included in this meta-analysis were obtained from a comprehensive PubMed, ISI web of knowledge and Cochrane database literature review focusing on ES recognition (search strings: "ES, " "fMRI, " "PET, " "SPECT"). The references of the retrieved articles were screened in order to identify additional articles dealing with the neural correlates of ES recognition. Inclusion criteria were as follows: neurologically healthy adults and experiments requiring participants to process ES during fMRI/PET/SPECT measurements. The foci employed in each study had to be reported in a standard reference space (Talairach/Tournoux, MNI). Differences in coordinate spaces (MNI vs. Talairach space) were accounted for by transforming coordinates reported in Talairach space into MNI coordinates using a linear transformation model (Lancaster et al., 2007). A random-effects analysis was performed, and single-subject reports were excluded. Table 1 reports the significance level of the reported activations. All studies reported activations surviving corrections for multiple comparisons except some (study 2, 16, 19, 20, 22, 27, and 37).
Based on these criteria, data from a total of 25 articles (including 22 fMRI and 3 PET studies) were entered into the study (see Table 1). In total, 37 experiments, i.e., lists of activation foci, were included in the first meta-analysis because 8 studies reported coordinates for more than one contrast. In this case, all contrasts were included in the analysis, since all of them reflected ES-related activations, e.g., in Lewis et al. (2006), coordinates from two contrasts (tools vs. animal and animal vs. tools) were reported, and we included both. Please note that this is a common procedure as can be found in previous ALE meta-analyses (for instance in Caspers et al., 2010;Tomasino et al., 2012Tomasino et al., , 2014. Taken together, the meta-analysis included data from 263 subjects and 627 activation foci.

Statistical Procedure
A statistical map was generated using lists of x, y, and z coordinates after transferring these foci into MNI space (Lancaster et al., 2007). The meta-analysis was completed using the revised version (Eickhoff et al., , 2012 of the GingerALE 2.1.1 software (brainmap.org) for coordinate-based meta-analysis of neuro-imaging results (Turkeltaub et al., 2002;Laird et al., 2005Laird et al., , 2009). Using the False Discovery Rate (FDR) with q = 0.01, the test was corrected for multiple comparisons (Laird et al., 2005Eickhoff et al., 2009Eickhoff et al., , 2012, and a minimum cluster size of 100 mm 3 was set. The resulting areas were anatomically labeled by reference to probabilistic cytoarchitectonic maps of the human brain using the SPM Anatomy Toolbox (Eickhoff et al., 2005). Using a Maximum Probability Map (MPM), activations were assigned to the most probable histological area at their respective locations.
In meta-analysis 1, we identified the neural regions that were found to be consistently activated when listening to ES across multiple studies. In meta-analyses 2-5 (see below) we investigated how design choices might influence the activation observed in the list of the fMRI studies we evaluated.

Participants Patients
Inclusion/exclusion criteria. We included 7 neurosurgical patients meeting the following inclusion/exclusion criteria. Inclusion criteria were: a lesion involving areas included in Types of stimuli, task and control task employed, silent listening vs. judgments (L or J), button-press conditions, rest vs. silent events (R-S) employed as control condition in the contrast image that was analyzed, number of subjects investigated, contrast used in the present analysis, significant level of the reported activations (threshold), and number of selected foci for the ALE meta-analysis.
The studies involved different categories of environmental sounds: E, environmental sounds; S, speech sounds; A, animal sounds; Obj, object-related sounds; Act, action-related sounds; T, tool-related; H, human sounds (e.g., coughing, laughing); and M, mechanical sounds (e.g., helicopter, water).
Frontiers in Human Neuroscience | www.frontiersin.org the results of the meta-analysis, i.e., right and left temporoinsular-opercular cortex, being native Italian speakers, normal or corrected-to-normal vision and no history of psychiatric disease or drug abuse. Patients were excluded if they reported a hearing loss (as measured by the audiogram examination routinely performed before surgery), previous history of neurological problems or family history of developmental language problems or learning disabilities as well as inadequate speech comprehension, as they needed to understood the task and the instructions (for the tests included in their neuropsychological screening, see Table 2). Patients should not present with aphasia, as measured with standardized clinical tests (see Table 2). In particular, by excluding patients with aphasia and naming deficits we made sure that any impairment in ES recognition would not be related to language problems but reflect impaired ES processing. Lastly, among patients with lesions involving the right hemisphere we excluded those who had visuo-spatial/attentive deficits to make sure that any impairment in ES recognition would not be related to disorders of spatial attention (i.e., auditory) but reflect impaired ES processing. Seven right-handed neurosurgical patients (4M, 3F) (mean age 55.57 ± 12.47 years, and mean years of schooling 13.28 ± 4.34 years) were admitted to the local General Hospital some days before the beginning of the study. We tested patients' ES recognition ability before surgery. Each patient received a neuropsychological battery the day before the fMRI. The neuropsychological evaluation included tests assessing nonverbal intelligence, verbal short-term memory, praxis, visuospatial ability and planning, constructional apraxia, and language. All the patients performed these tasks successfully (See Table 2). Conventional T2-weighted MR imaging revealed low-grade lesions (82.17 ± 50.77, range: 9.94-161 mean cc in volume). The lesion overlap of all the patients showed that the lesion involved part of the left and right superior and MTG, temporal pole, hippocampus and parahippocampal area, insula, rolandic operculum, IFG (pars opercularis and triangularis), precentral gyrus and basal ganglia (see Figure 2A). The overlay plot of all the patients' lesions indicated the voxels most frequently damaged (see the bar code). The most frequent area (in bright green-yellow) corresponds to the hippocampus/insula/STG (see Figure 2A).
The study was approved by the Ethics Committee of our Institute and performed in accordance with the 1964 Declaration of Helsinki and subsequent amendments. The subjects' consent was obtained.

Environmental Sound Auditory Confrontation Naming Task Stimulus norming study
The primary goal of our norming study was to create a corpus of stimuli and responses, develop scoring criteria and determine a cut-off for the patients' Z scores. We used the original Marcell et al. (2000)'s corpus of stimuli (N = 120) of everyday, non-verbal digitized sounds belonging to many different categories such as sounds produced by animals, people, musical instruments, tools, signals, and fluids (Marcell et al., 2000) to conduct our own rating study for the Italian population, as there might be population-dependent differences in sound knowledge and frequency. *.waw files were downloaded from their archive (http://marcellm.people.cofc. edu/confrontation%20sound%20naming/confront.htm) as 16bit *.WAV files with a sampling rate of 22,050 Hz.
Following Marcell et al. (2000), the participants' primary task was to carefully listen to sounds and name the stimuli. Sounds were presented at a comfortable, preset loudness established through pilot testing. Each of the randomly ordered sounds was presented once, and participants were allowed 30 s to complete their identification. The tasks lasted about 45 min. Presentation R software (Version 9.9, Neurobehavioral Systems Inc., CA, USA) was used for auditory stimuli presentation. Answers were recorded by a PC and written down by the experimenter for later analysis. The experimenter then used these responses to establish the patients' scoring accuracy.
In order to determine the mean accuracy and evaluate the subjects' responses, we used the same criteria as used by Marcell et al. (2000). In particular, the following were scored as correct: synonyms, accurate descriptions of the sound, plurals, selfcorrections. By contrast, lack of response or a "don't know" type of response and generalized superordinate descriptions of the item were scored as incorrect inaccurate descriptions of the sound. Furthermore, our rating study revealed that there were some sounds that were recognized by healthy controls as different from the responses reported in Marcell et al.'s study (Marcell et al., 2000). For example, our healthy participants recognized sounds like "explosion" as a "shot" (N = 5/20 subjects) or sounds like "frying food" as rain (N = 9 subjects), or sounds like typewriter (manual) as cash register (N = 7/20 subjects). For these items, if patients responded in a similar way as controls, we accepted their responses as correct. Last, there were some sounds that were not identified by healthy controls, such as cutting paper or water dripping (in both instances, 10/20 [50%] subjects did not recognize the sound) or a sonar (6/20 [33%] subjects did not recognize the sound). For these items, if patients responded in a similar way as controls, we accepted their responses as correct. Following these criteria, the participants correctly identified 87.83% ± 4.38 sounds (range 80-94.44). This result is very similar to the mean naming accuracy reported by Marcell et al. (2000) in their rating study (82.18 ± 22.67).

Task and procedure
Patients were asked to carefully listen to some sounds and name the stimuli ("Identify each sound as quickly and as accurately as you can"). We used the naming task similarly to Marcell et al.'s study. Sounds were presented at a comfortable, preset loudness established through pilot testing. Each of the randomly ordered sounds was presented once, and participants were allowed 30 s to complete their identification. Presentation R software (Version 9.9, Neurobehavioral Systems Inc., CA, USA) was used for stimuli presentation. Answers were recorded by a PC and written down by the experimenter for later analysis. Both the patients and healthy controls performed this task prior to the fMRI session. A typical testing session lasted 45 min.

Data analysis
Responses were analyzed by two independent raters. Accuracy was computed following the guidelines of Marcell et al. (2000) and according to the results of our own stimulus rating study (see below). For each patient we determined the Z score to calculate the number of patients whose performances were below the normal range (the reference group was healthy individuals). In addition, we performed a qualitative analysis of errors and labeled them as: semantically related to the target sound, auditorily related to the target sound (some sounds were both semantically and auditorily related to the target sound, in which case we coded them as semantically and auditorily related), unrelated and "I don't know, " and we expressed the total number of the different types of errors as % of the total errors. Last, we coded the errors according to the sound category by Marcell et al. (2000) (in their paper, a classification of sounds according to 27 categories can be found in Table 10).

Study 3: Functional Magnetic Imaging (fMRI) Study
The same ES auditory confrontation naming task with the same stimulus list described above was used during fMRI measurements involving the patients included in Study 2. Study 2 was a stimulus norming study including the original Marcell et al. (2000)'s corpus of stimuli (N = 120), whereas Study 3 included the final set of 90 stimuli. In the fMRI study, patients and healthy participants (see below) silently named the stimuli. We carefully instructed the subjects on how to perform the task. We asked them to listen carefully and name each stimulus, and at the end of the fMRI acquisition they would be asked some questions about each stimulus. Patients were highly motivated to perform the fMRI task as they knew that the fMRI maps are part of their clinical examination. More importantly, during fMRI acquisition we routinely performed online General Linear Model (GLM) analysis and continuously checked the activation and the BOLD signal correlation with the alternation of task and rest. If the GLM analysis showed that activation correlated significantly with the task, patients were performing the task appropriately. On the contrary, if no correlation emerged, we stopped the acquisition, talked to the patient and started the task again.

Healthy Controls for the fMRI Study
The patients' fMRI maps were compared with those of a control group consisting of 12 monolingual native Italian speakers (7 F, 5 M; mean age 35.75 ± 4.2 years old; age range 29-41; mean handedness 88.88 ± 17.88, range 100-50; mean education 16.5 ± 2.23 years, range 13-18). We found a significant difference age and education [t (17) = 19.82, p < 0.001 and t (17) = −3.2, p < 0.05] between patients and healthy controls, but not a gender effect [t (17) = −0.41, p > 0.05]. All participants had normal or corrected-to-normal vision and no history of neurological illness, psychiatric disease, or drug abuse. Following Marcell et al. (2000), we checked that none of the healthy controls responded affirmatively to the selfreport question, "To the best of your knowledge, do you have a hearing loss?" All gave their informed consent to participate in the study.

Task and Procedure
The task started with an instruction (3 s). Subjects were asked to "carefully listen to the sounds and silently name the source of each sound as accurately and as quickly as possible." During auditory stimulation a fixation cross was present on the screen. Blocks of ES recognition stimuli (N = 18, 15 s each) were alternated with baseline resting periods (N = 17, 15 s each, plus two additional resting blocks, one at the beginning of the run and the other at the end). In the baseline condition, a fixation cross (15 s) was presented between blocks and patients and controls were asked to relax. Each 15-s block included 5 stimuli, for a total of 90 stimuli. The same stimulus list used in the off-line pre-fMRI testing was presented during scanning. Presentation R software (Version 9.9, Neurobehavioral Systems Inc., CA, USA) was used for stimulus presentation and synchronization with the MR scanner. Participants listened to the stimuli via an Audio System (Resonance Technology).

fMRI Data Acquisition
A 3-T Philips Achieva whole-body scanner was used for both the patients and healthy controls to acquire T1-weighted anatomical images and functional images using a SENSE-Head-8 channel head coil and a custom-built head restrainer to minimize head movements. For both the patients and controls, functional images were obtained using a T2*-weighted echo-planar image (N = 222 EPI) sequence of the whole brain. The imaging parameters were as follows: repetition time, TR = 2500 ms; echo time, TE = 35 ms, field of view, FOV = 23 cm, acquisition matrix: 128×128, slice thickness: 3 mm with no gaps, 90 • flip angle, voxel size: 1.8 × 1.8 × 3 mm; parallel imaging, SENSE = 2), and were preceded by 5 dummy images that allowed the MR scanner to reach a steady state.
fMRI Data Processing and Whole Brain Analysis fMRI data pre-processing and statistical analysis were performed on UNIX workstations (Ubuntu 8.04 LTS, i386, http://www. ubuntu.com/) using MATLAB r2007b (The Mathworks Inc., Natick, MA/USA) and SPM5 (Statistical Parametric Mapping software, SPM; Wellcome Department of Imaging Neuroscience, London, UK). Dummy images were discharged before further image processing. Pre-processing included spatial realignment of the images to the reference volume of the time series, segmentation producing the parameter file used for normalization of functional data to a standard EPI template of the Montreal Neurological Institute template provided by SPM5, re-sampling to a voxel size of 2 × 2 × 2 mm, and spatial smoothing with a 6-mm FWHM Gaussian kernel to meet the statistical requirements of the General Linear Model and to compensate for residual macro-anatomical variations across subjects.
We checked that the movement parameters for all the patients and healthy controls were < 3 mm for translation and < 3 for rotation. We used the lesion masking image, i.e., a ROI image drawn on the patient's lesion in which the voxels are coded as 0 (tumor area) and 1 (healthy brain tissue). In the normalization procedure, we included the lesion masking image following Brett et al's technique (2001). This procedure allows to exclude the masked region (i.e., the lesion that would otherwise produce artifacts altering the normalization outcome) from normalization. Then, the normalization outcome was inspected carefully. In particular, three observers (B.T., D.S., and M.M.) independently compared the original and the normalized images and excluded any distortion phenomenon.
To delineate the network related to the ES recognition task, we modeled the alternating epochs by a simple boxcar reference vector. A general linear model for blocked designs was applied to each voxel of the functional data by modeling the activation and the baseline conditions for each subject and their temporal derivatives by means of reference waveforms which correspond to boxcar functions convolved with a homodynamic response function (Friston et al., 1995a,b). Furthermore, we included 6 additional regressors that modeled the head movement parameters obtained from the realignment procedure. Accordingly, a design matrix, which comprised contrast modeling alternating intervals of "activation" and "baseline" (resting), was defined. At a single subject level, specific effects were assessed by applying appropriate linear contrasts to the parameter estimates of the baseline and experimental conditions resulting in t-statistics for each voxel. For the single-subject first-level analysis, low-frequency signal drifts were filtered using a cut-off period of 128 s. These tstatistics were then transformed into Z-statistics constituting statistical parametric maps (SPM{Z}) of differences across experimental conditions and between experimental conditions and the baseline. SPM{Z} statistics were interpreted in light of the theory of probabilistic behavior of Gaussian random fields (Friston et al., 1995a,b). With regard to second-level random effects analyses for both patients and healthy controls, contrast images obtained from individual participants were entered into a one-sample t-test to generate a SPM{T} indicative of significant activations specific for this contrast at the group level. For both patients' and controls' group we included age and education as covariate. We used a threshold of P < 0.05, corrected for multiple comparisons at the cluster level [using family-wise error (FWE)], with a height threshold at the voxel level of P < 0.001, uncorrected.
The following contrast images were calculated: first, we estimated the main effects of CONDITION (ES listeningbaseline for the controls > task ES listening-baseline for the patients), then we performed a conjunction null analysis (and not a global null analysis) (Friston et al., 1999), showing the commonly activated network for both tasks (ES listeningbaseline for the patients > ES listening-baseline for the controls) using a threshold of p < 0.05, corrected for multiple comparisons at the cluster level (using FWE), with a height threshold at the voxel level of p < 0.001, uncorrected. The anatomical interpretation of the functional imaging results was performed using the SPM Anatomy toolbox (Eickhoff et al., 2005).

Study 1: Meta-analysis Study of the Reviewed fMRI Studies about Environmental Sound Processing
The activation clusters resulting from meta-analysis 1 of all the reviewed studies comprised: (i) the right STG, extending to the MTG and Heschl's gyrus, the insula and the operculum  Table 3 and Figure 1) 1 .
In meta-analysis 2 we investigated first the effect of the type of stimulus. Action-vs. animal-related stimuli activated the left and right MTG, the left SMG, the left pars triangularis of the IFG and the right pars orbitalis. Animal-vs. action-related stimuli activated the left and right STG (see Table 3 and Figure 1). As indicated in Figure 1, data show that part of the network related to ES processing is influenced by the "type of stimulus" factor.
In meta-analysis 3 we addressed the effect of the type of control sound. Some studies compared ES stimuli to silent stimuli, rest or fixation conditions, and others compared ES stimuli to other control auditory sounds. Control sounds vs. silent events or rest activated the IFG (pars triangularis). Silent events or rest vs. control sounds activated the right STG and the left MTG, the left SMG and the right precentral gyrus, the right insula, the right putamen, the middle cingulate cortex, and the cerebellum bilaterally (see Table 3 and Figure 1).
In meta-analysis 4 we addressed the weight of the type of task: category judgment vs. passive listening. Part of the activation found in the left IFG (pars opercularis) and left rolandic operculum was related to making a category judgment compared to passive listening. This contrast revealed activation in the right STG and the left Heschl's gyrus (see Table 3 and Figure 1).
In meta-analysis 5 we addressed the weight of the type of response: button press vs. no button press/silent decision. Part of the activation found in the right and left STG, the right and left insula, the right and left IFG (pars triangularis and orbitalis), the right putamen and the right cerebellum is related to button press. The contrast revealed activation in the left inferior parietal lobe only (see Table 3 and Figure 1).
To sum up, the networks found in meta-analyses 2-5 do not tap ES-related activations because, according to the logic of cognitive subtraction, this is "subtracted out" and the resultant map reflects the effect of an external variable (i.e., type of stimulus, or type of response) on the network. Figure 1A shows the overlap between the general network and the networks found in meta-analyses 2-5 (in green). The areas that are not influenced by the effect of any external variable are shown in red. These included: the hippocampal area bilaterally, the right rolandic operculum, part of the STG bilaterally, and the left post-central area and superior parietal lobule. Table 4 and Figure 2 show the patients' performances on the ES confrontation naming task. Patients scored below the  The general network involved in ES recognition and the results of the subtraction analyses are reported. Peaks of activation corrected above the threshold, MNI Coordinates (x, y, z) of maximum ALE value, and maximum ALE value of this cluster. All peaks are assigned to the most probable brain areas as revealed by the SPM Anatomy Toolbox (Eickhoff et al., 2005). normal range (as measured by Z-scores). Most of the patients' responses were not related to target sounds (35.94 ± 17.21%, see Figure 2 and Table 5) or were semantically related to target sounds (29.72 ± 8.73%). The other types of responses were: auditorily related (12.19 ± 6.66%), semantically and auditorily related (11.01 ± 4.20%), and "I don't know" answers (11.85 ± 20.72%) (unrecognized by patients but correctly identified by controls). Last, we coded the errors according to the sound categories by Marcell et al. (2000) (see their Table 10 for a classification of sounds according to 27 categories). We found that the 15.14% of the patients' errors involved musical instruments, 14.74% involved animal sounds, 37.45% involved other categories (e.g., transportation, nature, signals, accidents, weapons), and 32.67% involved actions/human sounds. Note that stimuli belonging to "musical instruments" and "animal sounds" are less numerous than those belonging to the "other" and "actions/human sounds" categories). For this reason, any further investigation of living vs. non-living related differences was not addressed.

Study 3: fMRI Investigation
The areas showing a different activation in patients vs. controls (controls > patients) were: (i) the right STG, (ii) the left STG, (iii) the right IFG (pars opercularis), and (iv) the left insula extending to the IFG (pars triangularis) (see Figure 2 and Table 5). As to differences in activation across the STG, by using Marsbar (http://marsbar.sourceforge.net/), we drew two ROIs on the two clusters found in the left and the right STG (which were less activated in patients than controls), shown respectively in pink and in red in Figure 2B. We overlapped all the ROIs of the lesions checking the density bar showing how many patients had a lesion overlapping with the two ROIs. The map showed that the red ROI (right STG) overlapped with about 2 of the RH lesions, and that the pink ROI (left STG) overlapped with about 2 of the LH lesions. Taken together, these data suggest that the two ROIs on the two clusters found in the left and the right STG overlapped at least with 4 of the 7 lesions, indicating that the lack of activation in the STG is related to brain damage.
The reverse contrast (patients > controls) revealed a higher activation in patients vs. the control group, in the anterior cingulate, the right thalamus and the left inferior parietal lobule.
The functional areas whose activation was comparable to that of controls as revealed by the conjunction analysis (sound listening > baseline in patients > sound listening > baseline in controls) included: (i) the right Heschl's gyrus, extending to the STG, (ii) the left STG, extending to the MTG, (iii) the right IFG (pars opercularis), (iv) the left IFG (pars opercularis), (v) the SMA bilaterally, and (vi) the right insula (see Figure 2 and Table 5) 2 .

Overlap of the ALE Map with the fMRI Map of Patients and Controls
In Figure 3 we used the "Logical Overlays" function in Mango (http://ric.uthscsa.edu/mango/). We overlapped the ALE map (in blue) with the fMRI map of our patients (in green) and that of healthy controls (in red). Different combination of overlaps were included, e.g., fMRI control and fMRI patients; ALE map and fMRI controls. As shown in Figure 3, the three maps overlap in the STS. This is consistent with the less activation in the STS found in patients vs. controls (see the two ROIs shown in Figure 2C).

DISCUSSION
In the present multimodal study we used a new approach combining different techniques and looking for converging evidence from multiple sources to explore the neuroanatomy of ES recognition.
The ALE meta-analysis delineated the core set of regions involved in ES as evidenced by fMRI literature. This analysis revealed the network of areas supporting ES recognition. We next showed that not all of the clusters truly reflect ES processing and how design choices, e.g., type of stimulus, type of task, type of control condition, might have influenced the activation observed in the fMRI studies we evaluated.  (Karnath et al., 2004). The number of overlapping lesions is illustrated by different colors that code for increasing frequencies (as indicated in the bar code). (B) Patients' pathological performance (mean accuracy) and healthy controls' accuracy and patients' qualitative analysis of errors. (C) The most frequently lesioned area (in bright green-yellow) is the hippocampus/insula/superior temporal gyrus, as shown by the Anatomy toolbox. By using Marsbar (http://marsbar. sourceforge.net/), we drew two ROIs on the two clusters found in the left and in the right STG (which were less activated in patients than controls), shown respectively in pink and in red. The density bar shows that at least 4 out of 7 patients' lesions overlapped with the ROIs drawn on the STG. (D) Network of areas commonly activated in patients and controls and areas differentially recruited by controls vs. patients during ES recognition in addition to the network for ES processing in patients and controls. Activations were superimposed on a brain template provided by spm5.
The hippocampal area bilaterally, the right rolandic operculum, part of the STG bilaterally, and the left postcentral area and superior parietal lobule were not influenced by any of the factors which might influence the ES network. Part of this region (i.e., hippocampus, rolandic operculum, and STG/MTG bilaterally) was found to be most frequently lesioned in our patient sample.
The pathological performance of neurosurgical patients showed that areas of the ES network have a causal role in ES processing, since a lesion in those areas caused a deficit in ES recognition. Patients had a normal performance on the neuropsychological screening. In particular, by excluding patients with aphasia and naming deficits we made sure that any impairment in ES recognition would not be related to language problems but reflect impaired ES processing.
These results indicate that a deficit of ES recognition does not arise exclusively following lesions to the right hemisphere or left hemisphere. So far, no precise anatomical locations have been correlated with auditory agnosia (Lewis et al., 2004). Our data, thus, add new information to the ES recognition related literature, showing that both the left and the right hemisphere, if damaged, can cause a deficit in ES recognition.
One crucial area involved in ES is the STG/MTG. Although the meta-analysis showed that part of the left and right STG/MTG activation during ES processing might be in part related to design choices, this area was one of the most frequently lesioned areas in our patient sample, thus highlighting its causal role in ES processing. The planum temporale is the auditory association cortex and it represents the first (input and processing) node (or computational hub, Griffiths and Warren, 2002) of the network involved in segregating the components of the acoustic stimulus and matching these components with learned spectrotemporal representations. The information is then gated to higher-order cortical areas for further processing (Griffiths and Warren, 2002). The STG activation has been reported as reflecting the input stages of ES processing (Lewis et al., 2004). In our fMRI study, right STG activation was For each region of activation, the coordinates in MNI space are provided with reference to the maximally activated voxel within an area of activation, as indicated by the highest Z-value (P< 0.05, corrected for multiple comparisons at the cluster level, height threshold P < 0.001, uncorrected). LH/RH, left/right hemisphere; M, medial.
found in patients and controls. This suggests that these areas were still actively functional in patients, too. However, the controls > patients comparison revealed that a greater activation of a sub-part of the STG in controls vs. patients, meaning that patients lacked activation in a crucial sector of the STG. The two ROIs on the two clusters found in the left and the right STG overlapped at least with 4 out of 7 patients lesions, indicating that the lack of STG activation found for patients is related to brain damage. Several authors found bilateral activations in the STG during ES processing, with a larger region of activation in the right STG (Bergerbest et al., 2004). The right STG posterior to the primary auditory cortex has been proposed as a node of the "neural semantic detector model" describing semantic memory for non-verbal sounds (Kraut et al., 2006). Of course, as evidenced by the ES recognition model (Lewis et al., 2004), too, the information resulting from the described processing steps needs the intervention of the semantic system, which is lateralized to the LH. Accordingly, the meta-analysis showed that studies requiring active judgment or categorization as compared to those requiring passive listening additionally activated the left IFG and the left rolandic operculum.
Our ALE meta-analysis included the right and the left parahippocampal gyrus as it does the temporal part of the lesion map. Its role in sound recognition might be related to the localization of sounds or the mental spatial imagery of source sound localization. Some authors suggested that the activation of this area possibly reflects the "imageability" of sounds (Sharda and Singh, 2012). See also (Engel et al., 2009).
ES processing requires allocating auditory attention/memory to the input sounds. Accordingly, the right IFG has been related to auditory working memory (Zatorre et al., 1994;Zatorre, 2001), or to allocating auditory attention (Lipschutz et al., 2002;Binder et al., 2004). Interestingly, it has been shown that auditory verbal hallucinations predominantly activate the right IFG (Sommer et al., 2008). In our fMRI study, the direct controls > patients comparison revealed that a part of the right IFG was more activated in controls than in patients, meaning that impaired patients lacked activation in a crucial sector of the IFG. It has been suggested that ES is polymodal in nature and the IFG bilaterally is responsible for integrating polymodal object representations with concepts in semantic memory. Interestingly, we found that activation in different sectors of the IFG was related to many design choices. Only the right pars triangularis truly reflected activation related to ES processing. This region, together with the insula, is an area that was frequently lesioned in our patients. It is part of a finely tuned attentional network which selects information from the continuous flow of auditory signals and triggers communication and balance between the RH and LH according to the nature of the stimulus (Habib et al., 1995). The insula has numerous direct connections with the auditory cortex (Adams and Janata, 2002;Bamiou et al., 2003). A comparison of a patient with total FIGURE 3 | We used the "Logical Overlays" function in Mango (http://ric.uthscsa.edu/mango/). We overlapped the ALE map (in blue) with the fMRI map of our patients (in green) and that of healthy controls (in red). Different combination of overlaps were included, e.g., fMRI control and fMRI patients; ALE map and fMRI controls. In particular, in green the overlap of the three maps in the STS. agnosia following bilateral insular damage (Habib et al., 1995) with a case with no agnosia following left insula-thalamocortical projection damage (Hyman and Tranel, 1989) is indicative of the essential role of the bilateral insula in auditory stimuli preprocessing.
Since ES recognition is both a top-down and bottom-up driven process, it presupposes an interaction between many areas in the brain. And it also presupposes an involvement of the fiber tracts. In our patients, the lesions might also be interpreted in terms of damage to the fiber tracts. Indeed, many of the clusters discussed here are interconnected with the input nodes of the temporal cortex. Accordingly, in the temporal-parietal lobe area, the activation found in the ALE map, including the left precentral/post-central gyrus/supramarginal gyrus, was related to the type of stimulus. We found that action-vs. animal-related stimuli activated the left supramarginal gyrus. It is known that perceptual processing and semantic processing interact to represent ES. Thus, these activations might be related to the action/human sound category processing. With regard to action-related verbs and phrases, it has been shown that imagery of the verbal context could be responsible for activation in sensorimotor areas (e.g., Tomasino and Rumiati, 2013a,b).

CONCLUSION
ES recognition is dependent on a bilateral network of areas in the temporal, inferior frontal basal ganglia, and areas of the pre-and post-central gyrus, as shown by the ALE metaanalysis. We showed that some of these clusters of activation truly reflect ES processing, whereas others are related to design choices.
The hippocampal area bilaterally, the right rolandic operculum, part of the STG bilaterally, and the left postcentral area and superior parietal lobule were not influenced by any of the factors which might influence the ES network. In addition, the lesion map evidenced areas that are necessary for ES processing, namely the hippocampus, STG/MTS area and the rolandic operculum, which might be deregulated in activation as compared to healthy controls.