Theta- and Gamma-Band Activity Discriminates Face, Body and Object Perception

Face and body perception is mediated by configural mechanisms, which allow the perception of these stimuli as a whole, rather than the sum of individual parts. Indirect measures of configural processing in visual cognition are the face and body inversion effects (FIE and BIE), which refer to the drop in performance when these stimuli are perceived upside-down. Albeit FIE and BIE have been well characterized at the behavioral level, much still needs to be understood in terms of the neurophysiological correlates of these effects. Thus, in the current study, the brain’s electrical activity has been recorded by a 128 channel electroencephalogram (EEG) in 24 healthy participants while perceiving (upright and inverted) faces, bodies and houses. EEG data were analyzed in both the time domain (i.e., event-related potentials—ERPs) and the frequency domain [i.e., induced theta (5–7 Hz) and gamma (28–45 Hz) oscillations]. ERPs amplitude results showed increased N170 amplitude for inverted faces and bodies (compared to the same stimuli presented in canonical position) but not for houses. ERPs latency results showed delayed N170 components for inverted (vs. upright) faces, houses, but not bodies. Spectral analysis of induced oscillations indicated physiological FIE and BIE; that is decreased gamma-band synchronization over right occipito-temporal electrodes for inverted (vs. upright) faces, and increased bilateral frontoparietal theta-band synchronization for inverted (vs. upright) faces. Furthermore, increased left occipito-temporal and right frontal theta-band synchronization for upright (vs. inverted) bodies was found. Our findings, thus, demonstrate clear differences in the neurophysiological correlates of face and body perception. The neurophysiological FIE suggests disruption of feature binding processes (decrease in occipital gamma oscillations for inverted faces), together with enhanced feature-based attention (increase in frontoparietal theta oscillations for inverted faces). In contrast, the BIE may suggest that structural encoding for bodies is mediated by the first stages of configural processing (decrease in occipital theta oscillations for inverted bodies).


INTRODUCTION
Humans can identify hundreds of faces with ease, although all share a common 3D structure (i.e., two eyes above the nose, which is in turn above the mouth). It is believed that this extraordinary ability is mediated by face-sensitive perceptual mechanisms (i.e., configural processing), which allow the perception of faces as wholes (i.e., as gestalts), rather than a sum of the individual components (McKone and Yovel, 2009;Monti et al., 2020). The reduced accuracy (and increased latency) in recognizing faces when they are perceived upside-down rather than in their canonical orientation is known as the ''face inversion effect'' (FIE; Yin, 1969), and has traditionally been considered as (indirect) evidence for the existence of configural processing for upright faces only. In addition, since this effect is much smaller for non-face objects (Valentine, 1988) and objects of expertise (Robbins and McKone, 2007), it has been suggested that FIE might be face-specific, thus leading to the conclusion that configural processing only mediates upright face perception (ibidem).
By configural processing, we refer to any phenomenon that involves perceiving spatial relations (i.e., configuration) among the features of a stimulus, such as a face (Reed et al., 2006). Maurer et al. (2002) identified distinct stages of configural processing: (i) first-order spatial relations define the relative positions in space of the parts of an object, such as the placement of the eyes above the nose; (ii) second-order relational information that refers to the exact metric distances between parts, e.g., the distance between eyes, nose, and mouth; and (iii) the last stage of configural processing is represented by the holistic stage or undifferentiated template representation of the face (i.e., perceiving the face as a whole; Gauthier and Tarr, 2002). Inversion was proven to affect all these configural processing stages (Maurer et al., 2002), and face processing is thought to specifically require and rely on the last stage (i.e., the holistic stage; ibidem).
Electrophysiological markers of holistic processing have been reported by means of electroencephalography (EEG), a technique that monitors the brain's electrical activity with excellent temporal resolution (Tucker, 1993). Much evidence suggests the existence of a face-sensitive event-related potential (ERP) negative component peaking at around 170 ms post-stimulus onset (N170) and reflecting early perceptual processing of the human visual system (Bentin et al., 1996(Bentin et al., , 1999Rossion and Gauthier, 2002). Given that the N170 is larger and delayed for inverted faces, it is believed that this component reflects early visual structural encoding (Rossion et al., 2000;Watanabe et al., 2003). These findings have also been corroborated using magnetoencephalography (MEG), which allows the recording of neuromagnetic activity (Rivolta et al., 2012(Rivolta et al., , 2014. ERPs reflect brain activity that is phase-locked (i.e., evoked) to the stimulus onset, and they are calculated by averaging the EEG signal from all trials. However, ERPs hide information derived from induced (non-phase locked) activity, which mainly reflects high-cognitive, rather than perceptual, activity (Uhlhaas and Singer, 2010;Donner and Siegel, 2011). The induced activity can be extracted on a single-trial level and can be retrieved by time-frequency analyses, also known as time-frequency representations (TFRs), on different frequency bands (Donner and Siegel, 2011;Oostenveld et al., 2011;Rivolta et al., 2015). In the visual system, high-frequency, low amplitude gamma-band (>25 Hz) activity has been suggested to mediate perceptual binding and the grouping of visual information (Singer and Gray, 1995;Tallon-Baudry and Bertrand, 1999;Grent-t'-Jong et al., 2016). According to the ''representational hypothesis,'' induced gamma synchronization is a sign of visual features binding, also related to the holistic processing of faces (Tallon-Baudry and Bertrand, 1999). In response to coherent visual stimuli (e.g., faces, as well as objects perceived in visual illusions), the induced gamma activity was focused at occipital and parieto-occipital locations, suggesting that it originates, at least in part, in visual areas (Grützner et al., 2013). This hypothesis is also supported by the finding that it partially follows a retinotopic organization. This definition implies that gamma-band oscillations represent a marker of holistic processing and second-order spatial information processing (i.e., the highest stages of configural processing; Maurer et al., 2002). Evidence suggests that enhanced gamma oscillations are induced by faces over occipito-temporal areas when compared to control stimuli such as houses or scrambled stimuli (Zion-Golumbic and Zion-Golumbic et al., 2008;Gao et al., 2012). Since these oscillations show a physiological FIE (i.e., upright faces induced enhanced synchronization in gamma-band oscillations when compared to inverted faces), this activity likely reflects the difficulty of the visual system to bind facial features of inverted faces (i.e., holistic processing is not engaged by inverted stimuli) in a configural representation (Lachaux et al., 2005;Anaki et al., 2007;Dobel et al., 2011;Moratti et al., 2014;Matsuzaki et al., 2015;Uono et al., 2017). In line with behavioral results, even non-face stimuli (i.e., objects) show a physiological inversion effect, albeit of smaller magnitude (Tallon-Baudry and Bertrand, 1999). Overall, behavioral and physiological (EEG/MEG) data converge and indicate that upright face perception is mediated by holistic processing, which, in turn, is mediated by gamma-band synchronization in the visual system.
Lower frequency oscillations, especially in the theta-band (4-7 Hz), correlate with various cognitive and attentional mechanisms (Ptak et al., 2017). It has been shown that corticohippocampal interactions mediate theta activity in cognition (Lopes da Silva, 1992;Bas , ar, 1999) since theta activity represents the spontaneous rhythm of different limbic structures. Although theta-band connections between limbic structures and the visual cortex are not specifically linked to the FIE, they have been reported in various emotional paradigms (Aftanas et al., 2001(Aftanas et al., , 2002 and several experiments involving facial recognition and facial emotional expressions (Bas , ar et al., 2006;Bas , ar, 2009, 2014). In those studies, occipital and occipito-temporal areas showed enhanced theta synchronization when processing emotionally arousing visual stimuli or faces showing emotional expressions at latencies around 200-500 ms post-stimulus. Theta synchronization was also observed when processing emotional facial expressions over different regions (Balconi and Lucchiari, 2006;Knyazev et al., 2009;Zhang et al., 2012). Furthermore, enhanced theta-gamma coupling induced by upright (as compared to inverted) faces has been shown in the right inferior occipital gyrus (IOG) after 200 ms post-stimulus onset (Sato et al., 2014. These findings suggest that occipital theta-band oscillations may represent a marker of the fast, early, perceptual processing of highly salient stimuli. Albeit critical in human social interactions, faces do not represent the only stimuli we rely on; recognition of individuals also relies heavily on body processing. Similar to faces, bodies constitute fundamental mediums for emotional expression and communication, and they show a universal configuration (i.e., torso, arms, legs; for a review, see de Gelder et al., 2015). One of the main differences between the two categories is that, while faces are a very important medium to convey identity, bodies strongly convey information about actions and intentions (Iacoboni et al., 2005). At the cognitive and neurophysiological level, however, body and face processing share certain perceptual mechanisms: body inversion, like faces, causes a drop in performance (i.e., the body inversion effect, BIE; Reed et al., 2003;Bonemei et al., 2018), which suggests that even body perception is mediated by configural processing. Additionally, as for faces, the N170 is larger and delayed for inverted bodies (Stekelenburg and de Gelder, 2004). When considering configural body processing, Reed et al. (2006) identified another stage of configural processing, located in between first-and second-order relational information: structural information or hierarchical structure stage. Structural information refers to information about the organization of parts in terms of the overall object as well as the spatial relationship of each type of part relative to each other. For instance, in a body, arms and legs may vary in how far above or below each other are in space, but they are still connected to the same regions of the torso, which defines the overall hierarchical structure of the body. Even though both face and body perception relies on configural processing, these categories may involve different perceptual stages to different extents (Reed et al., 2006): while face processing relies on all levels of configural processing (Maurer et al., 2002), the study by Reed et al. (2006) showed that body processing seems to rely only on lower stages of configural processing (i.e., first-order relational information and structural information) and not on higher stages (i.e., holistic processing and second-order relational information; see also Minnebusch and Daum, 2009). Indeed, they found a BIE in posture recognition when inverting intact bodies and half body pictures divided along the vertical axis (maintaining structural information), but not half body pictures divided along the horizontal axis (losing structural information), single body parts (based on feature processing) or scrambled bodies (losing firstorder relational information; Reed et al., 2006).
Although ERPs of the BIE has been investigated, no study has so far compared oscillatory activity for face and body perception. Thus, in the current study, we compared neural oscillations elicited by the presentation of upright and inverted faces to those elicited by (upright and inverted) bodies and houses. Houses were chosen as control stimuli since, like bodies and faces, they can vary in specific features and their configuration, but they are not social stimuli (Negrini et al., 2017).
We focused our attention on gamma-and theta-band activity; the former is a specific marker of feature binding processes (Tallon-Baudry and Bertrand, 1999) and holistic processing (Bentin et al., 1996(Bentin et al., , 1999Rossion et al., 2000), whereas the latter is involved in the fast early perceptual processing of salient stimuli (Bas , ar et al., 2006;Bas , ar, 2009, 2014). We, thus, hypothesized to observe greater gamma-band activity when participants were processing upright rather than inverted faces. In addition, based on the previous literature showing the importance of theta-and gamma-band activity in feature binding and facial recognition processes (Tallon-Baudry and Bertrand, 1999;Güntekin and Bas , ar, 2014), we also expected to find some differences in processing between our stimuli categories in theta and gamma activity. More specifically, if the perceptual processes disrupted by face and body inversion were the same, then a similar pattern of oscillatory activity should be elicited by the inversion of both categories of visual stimuli, while a different pattern should be found for inverted houses. In contrast, if this was not the case, we expected to find different oscillatory patterns for each of the three stimulus categories.

Participants
Twenty-four healthy participants (11 M; mean age: 28.2 ± 5.8 years), mainly among university students and their acquaintances, were recruited. One participant was excluded from the analyses due to technical problems related to data quality. All participants gave written informed consent before enrolment in this study and were screened for contraindications to EEG: exclusion criteria included the presence of a history of any neurological or psychiatric disease, use of active drugs, abuse of any drugs (including nicotine and alcohol) as well as any skin condition that could be worsened by the use of the EEG cap. The study was approved by the local Ethics Committee of the University of East London (UEL) and was conducted in accordance with the ethical standards laid out in the 1964 Declaration of Helsinki. All participants had a normal or corrected-to-normal vision and were right-handed.

Stimuli
A total of 96 pictures were presented to each participant (one per trial). Thirty-two pictures of faces were extracted from the Radboud Faces Database (RaFD; Langner et al., 2010), 32 pictures of bodies were extracted from the Bodily Expressive Action Stimulus Test (BEAST; de Gelder and Van den Stock, 2011), and 32 pictures of houses were extracted from the dataset used in a previous EEG experiment (Negrini et al., 2017). All pictures representing faces and bodies conveyed neutral expression, and they depicted 32 different actors for faces and 32 for bodies (balanced for gender). Half of the pictures were presented upright, and the other half were inverted, in a counterbalanced design across participants.
All pictures were converted into black and white images and cropped to a blank background using Adobe Photoshop CS5 software (Adobe Systems, Inc, 2011) with a dimension of 7 × 10.5 cm, which subtended a visual angle of 4 • × 6 • on a 22-inch LCD monitor positioned 100 cm away from participants.
To match all stimuli's low-level visual features, mean luminance was manipulated and balanced using MATLAB R2016a (Mathworks, Inc, 2016) and the SHINE toolbox (Willenbockel et al., 2010) by means of a customized script.

Procedure
After participants gave written informed consent, the EEG cap was put on (see next section for details). Participants were conducted into and seated in a dimly illuminated electrically shielded room, where the cap was connected to the EEG amplifier, and participants began the experiment. The experiment was run using E-Prime 2.0 software (Schneider et al., 2007). Participants were presented with 96 stimuli that were divided into three blocks of 32 stimuli and randomly presented using a permutated block order for each participant. Before each block, five trials were presented as practice, and feedback was given on the participant's response. Each trial consisted of a fixation cross shown for 1 s and a stimulus displayed for 500 ms, followed by a response screen (maximum duration: 5 s), during which the participant could respond. Participants had to judge whether the stimulus was presented upright or inverted by pressing one of two buttons on an EGI Chronos response box. After the response (or after 5 s of response screen presentation), a gray screen was presented for 1 s before the next trial began.

EEG Data Recording and Analysis
EEG data were recorded using a high-density 128-channel Hydrocel Geodesic Sensor Net (Electrical Geodesic Inc., EGI, Eugene, OR, USA) referenced to the vertex (Tucker, 1993). The EEG signal was amplified with an EGI NetAmps 400 amplifier, digitized at a 1,000 Hz sampling rate, and recorded. No filters were applied during signal recording. Electrode impedances were kept below 50 k throughout the experimental procedure. EEG data were analyzed using MATLAB version R2016a (Mathworks, Inc, 2016) and customized scripts as well as the EEGLAB (Delorme and Makeig, 2004) and FieldTrip toolboxes (Oostenveld et al., 2011). A band-pass filter (1-100 Hz) and a notch filter (50 Hz) were applied to limit the signal of interest and remove power line noise. Data were subsequently segmented into epochs (i.e., trials) of 2,000 ms length, starting from the presentation of the fixation cross and ending 500 ms after presentation of the response screen. Each trial was baselinecorrected by removing the values averaged over a period of 1,000 ms (from 1,000 to 0 ms before the stimulus), during which participants were looking at the fixation cross. After visual inspection, trials affected by prominent artifacts (i.e., major muscle movement and electric artifacts) were removed, and bad channels were deleted. On average, 90 trials per participant were included in the analysis. The signal was referenced to the common average of all electrodes (Dien, 1998), and independent component analysis (ICA) was applied to remove the remaining artifacts related to the muscular and ocular activity. After we removed the remaining artifacts using ICA, noisy channels were spatially interpolated.
To obtain ERPs, all trials were averaged for each condition and participant. The N170 component amplitude was computed by averaging the activity in the range of 140-200 ms. The exact time-window was defined by visual inspection of the butterfly plots for each condition (Figure 1).
TFRs of oscillatory power changes were computed separately for each of the six stimulus categories (faces upright and inverted, bodies upright and inverted, houses upright and inverted). Time-frequency power spectra were estimated using Morlet wavelet analysis based on 3.5 cycles at the lowest frequency (5 Hz) increasing to 18 cycles at the highest considered frequency (60 Hz; time steps: 10 ms; frequency steps: 1 Hz; Oostenveld et al., 2011). We divided neuronal response components into those evoked (i.e., phase-locked) vs. induced (i.e., non-phase-locked) by stimuli (Figure 2; David et al., 2006;Donner and Siegel, 2011;Cohen and Donner, 2013;Herrmann et al., 2014). The TFR of the induced response was then isolated by subtracting the individual time-domain average from each trial before calculating the TFRs for single trials (Cohen and Donner, 2013;Premoli et al., 2017). This approach was adopted since we performed singletrial normalization by z-transforming the TFR of each trial for each frequency. The z-transformation was performed on the respective mean and standard deviation derived from the full trial length. Following the z-transformation, an absolute baseline correction for each trial was performed by subtracting the average of the −400 to −100 ms period for each frequency to ensure z-values represented a change from the baseline . The baseline correction time-window (−400 to −100 ms) was chosen to avoid evoked time-frequency activity that could be found some ms before stimulus onset in low frequencies. Subsequently, TFRs were averaged across trials per experimental condition. After performing this procedure, the result consisted of an event-related spectral perturbation (ERSP) measure that is robustly normalized based on the single-trial level (Grandchamp and Delorme, 2011). In the end, TFRs were cropped to the period of interest (−500 to 500 ms), removing time-frequency bins at the trial edges for which no values could be computed. Values were averaged across frequency bins to calculate power within two frequency bands, namely theta (5-7 Hz) and gamma (28-45 Hz), which are considered to be the most representative frequency ranges in the study of social visual stimuli (Güntekin and Bas , ar, 2014).
The gamma range limits were chosen since phase-locked, and time-locked gamma oscillations following visual stimulation are detectable in the 28-45 Hz range in different time-windows (Bas , ar, 2012).

Statistical Analyses
To compare sensor-level EEG data between stimulus conditions, non-parametric cluster-based permutation analyses (using a Monte-Carlo method based on paired t-statistics) were performed (Maris and Oostenveld, 2007). This method was shown to be very accurate in solving the multiple comparisons problem in M/EEG data, and it has been compared with other broadly used approaches (i.e., bootstrap-based and Bayesian approaches; Maris, 2012). Considering data separated by frequency range and time-window, t-values exceeding an a priori threshold of p < 0.05 were clustered based on neighboring electrodes. Cluster-level statistics were calculated by taking the FIGURE 1 | Plots of event-related potential (ERP) activity calculated over 11 right occipito-temporal electrodes, averaged over 23 participants separately for three stimulus categories (faces, bodies, and houses). In each plot, the black line represents upright stimuli, while the cyan line represents inverted stimuli. This picture clearly shows larger N170 components (negative deflections around 170 ms post-stimulus onset) for inverted faces and bodies but not houses.
FIGURE 2 | Time-frequency representations (TFRs) of mixed, evoked (phase-locked) and induced (non-phase-locked) activity, calculated over 11 right occipito-temporal electrodes (region of interest determined by the literature on the face inversion effect), averaged over 23 participants. Activity elicited by upright faces is shown. The picture shows how the induced activity is determined by computing the TFR of evoked activity (on ERPs averaged across trials) and subtracting it from the mixed activity at the single-trial level.
sum of the t-values within every cluster. Comparisons were performed for the maximum values of summed t-values. Using a permutation test (i.e., randomizing data across conditions and re-running the statistical test 1,500 times), we obtained a reference distribution of the maximum of summed cluster-level t-values to evaluate the statistic from the actual data. Clusters in the dataset were considered statistically significant at an alpha level of 0.05 if <5% of the permutations (N = 1,500) used to construct the reference distribution yielded a maximum cluster-level statistic larger than the cluster-level value observed in the original data.
To test whether our data replicated previous findings, three paired-samples t-tests were performed separately on ERPs computed by averaging trials for each participant in each condition. These comparisons investigated the inversion effect in different categories (faces upright vs. inverted, bodies upright vs. inverted, and houses upright vs. inverted) on the N170 component amplitude. Successively, ERP differences between upright and inverted stimuli were computed for the three categories (faces, bodies, and houses) by subtracting averaged µV values in the inverted condition from those in the upright condition for each timepoint. These differences were then compared by performing three paired-samples t-tests to test any interaction effects (face inversion vs. body inversion, face inversion vs. house inversion, and body inversion vs. house inversion).
To investigate FIE and BIE on N170 more in-depth, we also performed a repeated-measure ANOVA on N170 latency. The downside of this approach consists of an a-priori selection of channels, which is not necessary for cluster-based tests. For this reason, we selected only channels presenting a statistically significant difference in amplitude tests for FIE. Peak latency from these 34 resulting occipitotemporal channels was averaged and analyzed using inversion (upright vs. inverted) and stimulus (faces vs. bodies vs. houses) as independent variables.
Subsequently, three paired-samples t-tests were performed separately on induced data to investigate the inversion effect in different category comparisons (faces upright vs. inverted, bodies upright vs. inverted, and houses upright vs. inverted). For these tests, the activity in different frequency ranges was separated as described above. One time-window of interest (TOI) was defined by both referring to the existing literature on face perception and visual inspection of occipital single plots of activity (Figure 3): the epoch of interest for induced activity was set to 250-500 ms. This TOI was chosen because induced activity (especially in the gamma range) typically starts approximately 280 ms after presentation of the stimulus and is clearly disentangled from evoked activity at this latency (Tallon-Baudry and Bertrand, 1999), while 500 ms was chosen as the limit of the TOI because it is the time when the response screen was presented; thus, we expected motorrelated activity after this time-window. In this time-window, z-transformed values were averaged across time bins for each frequency.
Behavioral data were not analyzed for this study since participants' task was extremely basic (identifying stimulus orientation, upright vs. inverted). Indeed, participants' accuracy reached a ceiling effect (mean accuracy = 0.98). The task did not require to process information that could be harder to acquire when structural encoding is disrupted. For this reason, we did not expect to find a behavioral inversion effect on accuracy or response times: structural encoding is necessary to perform the task (identify whether the stimuli are presented upright or inverted), but would not influence participants' behavioral responses.

N170 Amplitude
Inverted faces showed a significantly larger N170 component than upright faces over a large bilateral occipito-temporal cluster of 34 electrodes (p = 0.001) and a frontal cluster of 48 electrodes (p = 0.001). Similarly, inverted bodies showed a significantly larger N170 component than upright bodies over a right occipito-temporal cluster (22 electrodes; p = 0.008) and a left frontal cluster (23 electrodes; p = 0.008). No statistically significant differences were found between upright and inverted houses ( Figure 4A).
The results also revealed significant interactions between face inversion and house inversion over a bilateral occipitotemporal cluster (29 electrodes; p = 0.008) and a frontal cluster (44 electrodes; p = 0.003), suggesting that face inversion generates an increase in N170 amplitude that is significantly higher than the increase generated by house inversion. Moreover, a statistically significant interaction between body inversion and house inversion was found over a right occipito-temporal cluster (19 electrodes; p = 0.025) and a left frontal cluster (20 electrodes; p = 0.017), indicating that the increase in N170 amplitude generated by body inversion was significantly higher than the increase generated by house inversion. Face inversion and body inversion showed no statistically significant differences ( Figure 4B). In summary, ERPs indicate that physiological FIE and BIE show similar magnitude and that house stimuli do not show an inversion effect.

Induced Activity
In the 250-500 ms time-window, a neurophysiological FIE showed that inverted faces, compared to upright faces, showed stronger theta synchronization in a right fronto-parietal cluster (14 electrodes; p = 0.004) and in a left parietal cluster (10 electrodes; p = 0.023; Figure 5A). Participants also showed a stronger theta synchronization induced by upright bodies (compared to inverted bodies) in a left-lateralized occipitotemporal cluster (19 electrodes; p = 0.006) and in a right prefrontal cluster (12 electrodes; p = 0.019).
The analysis of gamma-band activity showed stronger synchronization for upright faces than inverted faces (p = 0.002) FIGURE 3 | Plots of induced time-frequency activity calculated over 11 right occipito-temporal electrodes and averaged over 23 participants for each category of stimuli. This picture allows the reader to observe differences in patterns of activity for each category, in particular over the theta (5-7 Hz) and gamma (28-45 Hz) bands. over a right-lateralized occipito-temporal cluster of electrodes (21 electrodes; Figure 5B). No other inversion effects were statistically significant. In summary, gamma-band activity is reduced by face inversion only. FIE and BIE showed an opposite physiological pattern in theta: enhanced theta by face inversion and reduced theta by body inversion.

DISCUSSION
At the behavioral level, upright face and body perception are mediated by configural processing, which allows perceiving these stimuli as a gestalt (McKone and Yovel, 2009), rather than the sum of the individual parts. Configural processing does not mediate the perception of inverted faces and bodies, which relies on part-based mechanisms (Maurer et al., 2002;Reed et al., 2006); this is why performance on face and body perception tasks drops after stimulus inversion (i.e., the inversion effect; Yin, 1969;Reed et al., 2003) and this has been suggested to reflect the presence of configural mechanisms for the perception of both categories of visual stimuli Bonemei et al., 2018). Given that object perception is not affected by stimulus inversion, it has been suggested that object processing is only mediated by part-based, and not by configural, mechanisms (Gauthier and Tarr, 2002;Maurer et al., 2002). In summary, behavioral evidence indicates that both face and body perception relies on configural processing. What remains unknown is whether the inversion effects for faces and bodies are mediated by similar neurophysiological activity. Our findings, in line with previous evidence (Watanabe et al., 2003;Stekelenburg and de Gelder, 2004), indicate that FIE and BIE are characterized by a larger N170 for inverted stimuli (Rossion and Gauthier, 2002); this was spread over the bilateral occipito-temporal region for faces, albeit more lateralized to the right hemisphere for bodies (see Gliga and Dehaene-Lambertz, 2005 for a similar finding). Results found in N170 latency reflect what can be intuitively observed from Figure 1: delayed N170 components were found for inverted (vs. upright) faces, houses, but not bodies. These results show that the N170 delay is not a consistent and replicable index of BIE, as opposed to what was found for FIE (Rossion et al., 2000). This is in line with previous evidence showing the absence of BIE on N170 latency, despite its presence on N170 amplitude (Minnebusch et al., 2010;Mohamed et al., 2011;Soria Bauser and Suchan, 2013).
Results on induced neural oscillations, however, revealed clear differences between FIE and BIE. Specifically, face inversion induced gamma-band desynchronization over occipito-temporal electrodes (developing to the right side) and synchronization in the theta band over bilateral frontoparietal regions, whereas body inversion induced a desynchronization in the theta band over left occipito-temporal and right prefrontal areas. Therefore, it is important to try to differentiate between the theta synchronization found for inverted faces, and that found for upright bodies: here it is important to point out that cerebral activity in the same frequency band may mediate completely different functions in different cortical (and sub-cortical) areas and in different time-windows (Bas , ar, 1999).

Gamma and Theta Correlates of Face Perception
Upright face perception, as compared to inverted face or object perception, is mediated by stronger gamma-band activity (Tallon-Baudry and Bertrand, 1999). Our results corroborate FIGURE 4 | Summary of ERP results. All topographies were obtained by calculating the average voltage over the time-window of interest (TOI) for N170 components (i.e., 140-200 ms after stimulus onset). (A) Topographies show the activity evoked by upright and inverted faces (first row), bodies (second row) and houses (third row). The third column of topographies shows clusters where statistically significant differences between upright and inverted stimuli were found by means of non-parametric cluster-based permutation tests. Inverted faces and bodies showed a larger N170 (occipito-temporal areas) and VPP (vertex positive potential, frontal areas) than upright stimuli. (B) Topographies show the results of comparisons between inversion effects (i.e., interaction effects): the first and the second images show that face inversion and body inversion lead to significantly different changes in activity over the previously reported areas when compared to house inversion. The third image shows that face and body inversion effects did not differ in a statistically significant way. this finding and indicate that occipital gamma-band activity is a neurophysiological correlate of holistic face processing that, by definition, mediates upright face stimuli perception (Rodriguez et al., 1999;Anaki et al., 2007). The replication of previous findings in the gamma band demonstrates the reliability of the experimental paradigm and of the task we used, thus granting stronger reliability to our explanation of the novel results that we found in the theta band.
Activity in the theta band has been related to attention (Bas , ar, 1999;Klimesch, 1999), and it has been specifically associated with feature-based attentional functions (i.e., attentional resources deployed to process single features of visual stimuli, not bound in a configuration; Harris et al., 2017). It is thus possible that theta-band synchronization associated with the FIE reflects an increase in attentional resources towards inverted faces (which may have been needed to recognize their orientation rapidly) since the holistic face processing is disrupted by face inversion (Maurer et al., 2002). Furthermore, the scalp regions showing the FIE in theta are compatible with areas involved in the dorsal frontoparietal attentional network, responsible for endogenous allocation and maintenance of visuospatial attention (Corbetta et al., 2002;Ptak, 2012;Lückmann et al., 2014;Ptak et al., 2017). Even though it is known that upright faces automatically capture attention through a stimulusdriven/bottom-up mechanism (Sato and Kawahara, 2015), this is not the case for inverted faces (Langton et al., 2008;Tomonaga and Imura, 2009;Sato and Kawahara, 2015;Ariga and Arihara, 2018). Hence, it is likely that an explicit (topdown) attentional effort is required to process facial features that cannot be bound in a configuration since holistic processing is disrupted by inversion. This deployment of attentional resources may be what triggers the increased theta activation in the frontoparietal network.

Gamma and Theta Correlates of Body Perception
Contrary to what we observed for faces, upright (as compared to inverted) body perception induced greater theta synchronization over the left occipito-temporal and right prefrontal areas, whereas no significant differences were found in the gamma band. These results may suggest that (upright) body processing appears to be more related to the first stages of configural FIGURE 5 | Summary of results related to the induced activity. All topographies were obtained by calculating the average power over the TOI for induced activity (i.e., 250-500 ms after stimulus onset). (A) Topographies show the activity induced by upright and inverted faces (first row) and bodies (second row) in the theta band (5-7 Hz). The third column of topographies shows clusters where statistically significant differences between upright and inverted stimuli were found by means of non-parametric cluster-based permutation tests. Increased theta synchronization is highlighted for inverted faces (vs. upright faces) over a right frontoparietal and a left parietal cluster, whereas upright bodies (vs. inverted bodies) induced a significantly stronger theta synchronization over a left-lateralized occipito-temporal cluster and a right prefrontal cluster. (B) Topographies representing gamma-band (28-45 Hz) activity induced by upright and inverted faces are shown. The statistical comparison highlighted a stronger gamma synchronization for upright faces (vs. inverted faces) over the right occipito-temporal cluster. processing (i.e., first-order spatial information and structural information; Maurer et al., 2002;Reed et al., 2006) and featurebased processing. By contrast, differences that emerged during face processing are more related to the holistic processing of the stimulus, so that inverted faces require greater cognitive effort to be recognized. This interpretation is supported by findings showing that holistic processing may not be involved in the perception of human body forms (Soria Bauser et al., 2011Soria Bauser and Suchan, 2013). Indeed, it has never been demonstrated that holistic processing and secondorder spatial information processing are part of configural body processing (Minnebusch and Daum, 2009), whereas firstorder spatial information and structural information have been shown to be involved in it, when considering both behavioral (Reed et al., 2006) and neuroimaging results (Brandman and Yovel, 2014).
Additionally, the occipito-temporal synchronization showed clear lateralization to the left hemisphere. The literature has reported right lateralization for ERPs involved in body processing (for a review, see Gliga and Dehaene-Lambertz, 2005;de Gelder et al., 2015), similar to what we found for the N170. However, no previous studies have investigated oscillations in body processing. Considering that induced activity excludes, for the most part, evoked (time-locked) oscillations (i.e., activity related to ERPs), the left occipitotemporal induced synchronization represents a novel finding: while the early stages of body processing are primarily lateralized to the right hemisphere, a later (greater than 250 ms post-stimulus) stage of processing seems to be left-lateralized and consists of induced oscillations in theta range. This result may imply the involvement of bilateral occipito-temporal cortices in different stages of body processing, constituting an important difference with the overall right-lateralization of face processing. This new result requires corroboration in future research.
Upright and inverted houses did not show statistically significant differences in any of the considered frequency bands. This result confirms that this class of stimuli is much less subject to inversion effects since their perception relies mostly on part-based processing (Negrini et al., 2017). The lack of inversion effect in houses also confirms that the results we found are indeed specific for faces and bodies.

Conclusions and Future Directions
In this study we found new evidence that the neurophysiological mechanisms mediating face and body inversion effects have important differences (Reed et al., 2006;Soria Bauser et al., 2011Soria Bauser and Suchan, 2013): our results indicated the involvement of feature-binding processes for faces (i.e., occipital gamma activity). In addition, stimulus inversion can disrupt these processes and seems to require feature-based processing (i.e., theta frontoparietal activity). By contrast, the BIE appears to be less related to holistic processing and more to the first stages of configural processing.
While previous studies have investigated the causal relationship between neural markers of the BIE and the structural encoding process, which is involved, the present study is more explorative and correlational in nature, which however represents a limitation for most EEG/MEG and functional magnetic resonance imaging (fMRI) research. In other terms, we could indicate neurophysiological correlates of stimuli inversion, but without evidence of a causal relationship between them. However, to the best of our knowledge, no previous studies have investigated the oscillatory aspects of neural processes involved in body processing and, a fortiori, in the body inversion effect. A possible way to establish a causal correlation between behavior and neural oscillations might be, for instance, the adoption of transcranial alternate current stimulation (tACS; Gonzalez-Perez et al., 2019). Future work should also directly assess holistic processing by using different behavioral tasks such as the composite face task (Rossion, 2013).

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the local Ethics Committee of the University of East London (UEL) and was conducted in accordance with the ethical standards laid out in the 1964 Declaration of Helsinki. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
FB, PR and DR designed the experiment. FB and SB acquired the data. FB, IP and SP analyzed the data. FB, IP, PR and DR wrote the manuscript.

FUNDING
This work has been supported by Fondazione Cariplo and Regione Lombardia (Grant No. 2016-0925) to PR, by a scholarship from the University of Milano-Bicocca to FB and by the UK National Institute for Health Research Biomedical Research Centre at the South London and Maudsley NHS Foundation Trust to IP.