A Temporal Neural Trace of Wavelet Coefficients in Human Object Vision: An MEG Study

Wavelet transform has been widely used in image and signal processing applications such as denoising and compression. In this study, we explore the relation of the wavelet representation of stimuli with MEG signals acquired from a human object recognition experiment. To investigate the signature of wavelet descriptors in the visual system, we apply five levels of multi-resolution wavelet decomposition to the stimuli presented to participants during MEG recording and extract the approximation and detail sub-bands (horizontal, vertical, diagonal) coefficients in each level of decomposition. Apart from, employing multivariate pattern analysis (MVPA), a linear support vector classifier (SVM) is trained and tested over the time on MEG pattern vectors to decode neural information. Then, we calculate the representational dissimilarity matrix (RDM) on each time point of the MEG data and also on wavelet descriptors using classifier accuracy and one minus Pearson correlation coefficient, respectively. Given the time-courses calculated from performing the Pearson correlation between the wavelet descriptors RDMs and MEG decoding accuracy in each time point, our result shows that the peak latency of the wavelet approximation time courses occurs later for higher level coefficients. Furthermore, studying the neural trace of detail sub-bands indicates that the overall number of statistically significant time points for the horizontal and vertical detail coefficients is noticeably higher than diagonal detail coefficients, confirming the evidence of the oblique effect that the horizontal and vertical lines are more decodable in the human brain.


INTRODUCTION
Feature engineering and mapping input data to a discriminative feature space is the most important and critical part of classical object recognition systems. Various machine learning applications such as texture analysis, image compression and denoising utilize visual features and wavelet representations of images as promising features for object recognition purposes (Strickland and Hahn, 1997;Tieng and Boles, 1997;Khalil and Bayoumi, 2002;Vidal-Naquet and Ullman, 2003;Samani and Moghaddam, 2017;Samani et al., 2018). Performing a two-dimensional wavelet transform on an image provides one approximation and three detail (horizontal, vertical and diagonal) sub-bands representations of images. extracted the shape of pedestrians using only a combination of wavelet coefficients as input features to support vector machine (SVM) classifier. They then could detect the pedestrians in images with different indoor and outdoor backgrounds with a reasonable performance. proposed a Gabor wavelet model as a representation of images which yielded to better object recognition in comparison to applying conventional Gabor filters. Furthermore, wavelet transform has been implemented on diverse types of images from medical to real-world images which demonstrates the capability of this transformation in dealing with computer vision challenges.
On the other hand, among numerous research topics done in neuroscience and psychology regarding the visual processing (Attneave, 1954;Liu et al., 2002;Chaumon et al., 2009;Mamashli et al., 2010), many behavioral and neuroimaging studies on visual perception of different species including humans confirmed in comparison with oblique orientations, cardinal (horizontal or vertical) details of visual stimuli are better resolved in the brain which is known as the oblique effect (Taylor, 1963;Appelle, 1972;Freeman and Thibos, 1975;Poggio and Fischer, 1977;Orban and Vandenbussche, 1979;Essock, 1980;Bonds, 1982;Payne and Berman, 1983;Moskowitz and Sokol, 1985;Heeley et al., 1997;Pantazis et al., 2017). Neural data from different sources of functional neuroimaging modalities such as EEG, MEG and fMRI data have been used to represent this effect in the human visual processing. Furmanski and Engel (2000) used stimuli with cardinal and diagonal orientations in an fMRI study. They found that cardinal orientations generate an increased fMRI response amplitude in V1 area. presented standard and deviated stimuli containing task-irrelevant Gabor patches in an oddball sequence during a tracking task to investigate nonattended orientation anisotropies using ERP (event-relatedpotential). By recording visual mismatch negativity, they found that there is a difference between the amplitude of ERP evoked by standard and deviated stimuli around 170 ms in occipitotemporal areas as evidence to the existence of the oblique effect. applied multivariate pattern analysis on the Gamma band of MEG data acquired from an experiment using six different grating stimuli. Their results show cardinal orientations are better decoded in the human brain than the oblique ones.
As we described earlier, many studies have been conducted in both the human and machine vision domains to support the idea that the orientation detail is a key factor for object recognition purposes. Here, we built a bridge between a straight machine learning and a pure neuroimaging method in an object recognition challenge. In this study, we used the same stimuli applied in the human object recognition experiment. We extracted wavelet detail coefficients in horizontal, vertical and diagonal orientations to investigate the signatures of different orientations in the human visual system. Furthermore, instead of using basic grating stimuli used in different neuroimaging studies, we employed stimuli containing real-world objects and extracted orientation-related features from them. Our result shows that the oblique effect is evident even with these stimuli. Apart from that, we also studied the temporal neural signature of wavelet approximation coefficients at different levels of decomposition. Due to the downsampling, the approximation coefficients corresponding to higher levels of decomposition contain a denser representation of objects. Therefore, their corresponding timecourses which represent the temporal neural traces of higher level wavelet approximation coefficients peak later.

Experimental Design, Stimuli, and MEG Data acquisition
We applied all the analysis and inferences on the data of an experiment designed and conducted at MIT by Cichy et al. (2014). During this experiment, 92 stimuli from six distinct categories (human and non-human bodies and faces, natural and artificial images) presented to 16 healthy human participants (N = 16) while MEG data was acquired. These images were displayed for 500 ms, with 1.5-2s inter-stimulusintervals. The participants finished 10-15 MEG runs and every stimulus was shown twice in each run. To read more details see Cichy et al. (2014).

MEG Signal Preprocessing
MEG data were acquired from 306 sensor channels (Neuromag, Triux, Elekta, Stockholm) (Cichy et al., 2014). To compensate for the head movement, we preprocessed the raw MEG data with Max filter software (Elekta, Stockholm). Then, the resulting signal was denoised and analyzed using the brainstorm software (Tadel et al., 2011). We extracted each trial from 150 ms prestimulus onset to 1,000 ms post-stimulus onset (−150, 1,000). Then, we removed the baseline mean for each trial. We also discarded the trials having a peak-to-peak >6,000 fT and detected them as bad trials. Furthermore, a low-pass filter with a cutting frequency of 30 Hz has been used to smoothen the remaining trials. Finally, we utilized the frontal sensors of MEG data to automatically detect the Eyeblink artifacts and remove them by principal component analysis.

Multivariate Pattern Analysis (MVPA)
Multivariate pattern classification is a well-suited approach to decode brain activities associated with different perceptual stimuli. According to this method, if a classifier discriminates between the MEG data of two different stimuli, these two stimuli are separable in the human brain (Wardle et al., 2016;Cichy and Pantazis, 2017;Diedrichsen and Kriegeskorte, 2017;Grootswagers et al., 2017). To measure the perceptual differences of the stimuli, we trained a linear pairwise SVM classifier (Chang and Lin, 2011) at each time point (every millisecond) of the MEG trials associated with every pair of stimuli. In other words, we build the binary linear SVM model using 306-dimensional MEG pattern vectors which are the signal values of all MEG channels. In order to reduce noise and computational load, we permuted the trials randomly and divided them into K = 4 groups of 10 trials and averaged the trials within groups, resulting in 4 subaveraged trials per stimulus. We used K-1 trials per condition for training and held the remaining one which was not used in the training phase, for testing the SVM classifier. This procedure was repeated 100 times to find the SVM classifier performance. The accuracy of pairwise linear SVM classifier is used as a measure of dissimilarity between every pair of the stimuli to populate a 92 × 92 representational dissimilarity matrix (RDM) ( Figure 1B, the upper part of the panel). Considering 92 × 92 possible pairs of stimuli and 1,151 time points in each trial, MVPA yields 1,151 symmetric diagonal-undefined MEG RDMs. Having calculated the grand average of every matrix at each time point, we plotted and traced the time-course of object decoding in the human brain.

Wavelet Feature Descriptors
The multilevel wavelet transform decomposes a complex signal or an image into multiple simpler components which can be studied separately (Ravichandran et al., 2016). Discrete twodimensional wavelet transform uses a set of discrete scale and translation functions to decompose an image into a set of mutually orthogonal wavelet descriptors (Mallat, 1989;Antonini et al., 1992;Graps, 1995;Stanković and Falkowski, 2003). Equations (1-4) (Ravichandran et al., 2016) define the wavelet transform for calculating the approximation, horizontal, vertical and diagonal sub-bands descriptors respectively. j,m,n x, y and ϕ i j,m,n , defined in Equations (5, 6), describe the two-dimensional wavelet functions of scale (level) j at pixel in row m and column n of an input image with M rows and N columns of pixels. These functions act as low-pass and high-pass filters followed by downsampling. Superscript i in the Equation (6) shows the orientation of wavelet details coefficients which can be horizontal (H), vertical (V), and diagonal (D). The block diagram in Figure 2 describes one level of two-dimensional wavelet decomposition on an input image. According to this diagram, two finite impulse response (FIR) low-pass (h φ ) and high-pass (h ψ ) filters, selected from the collection of wavelet basis functions Φ and ψ defined in Equations (5, 6), are applied to columns or rows of an input image. To get the wavelet approximation coefficients, a sequence of two low-pass filters followed by downsampling (by a factor of 2) is applied to columns and rows of the image. Horizontal and vertical detail sub-images are calculated by a combination of a low and high-pass filter and downsamplings. As Figure 2 shows, swapping the order of low and high-pass filters switches the resulting orientation of detail descriptors. For extracting the detail descriptors, a sequence of identical high pass filtering and downsampling is applied to both columns and rows of an image. To compute the wavelet descriptors of level j+1, approximation wavelet descriptor of level j is used as an input image and the whole process is repeated in the same mentioned manner.
j,m,n x, y = 2 We employed two-dimensional Haar wavelet on the gray-scaled stimuli to decompose them into four orthogonal sub-bands. The mother Haar wavelets and ϕ are defined in Equations (7, 8).
The first row of Figure 1 depicts some examples of the original images and the second one illustrates their corresponding levelone approximation (A1) and horizontal(H1), vertical(V1) and diagonal(D1) high-frequency details. According to the Block Diagram in Figure 1, each level of decomposition provides four wavelet components whose rows and columns resolutions are half of the previous level. All the original images have 175 rows and columns. Due to downsampling of the images in both rows and columns at each level of decomposition, the size of approximation and detail sub-images in the fifth level is 6 × 6. All the approximation and detail sub-bands matrices calculated in different levels are flattened and rearranged into vectors separately. To study the overall neural information decoded in the details in each level, we concatenated the horizontal, vertical and diagonal details vectors corresponding to each stimulus as an additional descriptor and called it 'All details' descriptor. In general, after 5 levels of decomposition with 5 different wavelet descriptors per level, a total number of 25 wavelet descriptor vectors are determined for each stimulus. As shown in the bottom right corner of Figure 1, W1 to W25 represent the wavelet descriptors for different sub-band at different levels.

Wavelet Coefficients RDM
Representational dissimilarity matrix (RDM) maps the descriptors into a common space and provides the overall representational information of each wavelet descriptors. For each wavelet descriptor, we computed 1 min Pearson's rho as a dissimilarity measure between each pair of stimuli wavelet descriptors to construct a 92 × 92 RDM matrix. Since we have 25 different wavelet descriptors per stimulus, this process results in 25 RDM matrices corresponding to 25 wavelet descriptors ( Figure 1B lower part of the panel).

Representational Similarity Analysis
To trace the neural signature of stimulus wavelet descriptors in the visual system, we used the representational similarity analysis (RSA). We mapped MEG data of each time point to the representational space (92 × 92 RDMs) using the MVPA. Similarly, we also mapped the 25 wavelet descriptors to the representational space of 92 × 92 RDMs. Therefore, in this common two-dimensional (92 × 92) matrix space, we can compare the wavelet RDMs with the neural RDMs using Frontiers in Neural Circuits | www.frontiersin.org FIGURE 2 | One level of two-dimensional wavelet decomposition on an input image. A sequence of low-pass (h φ ) and high-pass (h ψ ) filters, selected from the collection of wavelet basis functions Φ and ψ, are applied on the columns (n) and rows (m) of images. While h φ (n) and h ψ (n) shows that the filters are applied in columns, h φ (n) and h ψ (n) show that the filters are applied in rows. All filters are followed by a downsampling () by a factor 2 on filtered rows or columns. Applying two low-pass filters (h φ (n) and h φ (m)) followed by downsampling the sub-image at first level (A1) provides the approximation sub-image of the second level (A2). The second level horizontal and vertical sub images (H2, V2) are calculated by a combination of a low and high-pass filters and downsampling A1 with the specified orders. For extracting the diagonal details descriptors of second level (D2), the high pass filtering and downsampling by factor 2 is sequentially applied to both columns and rows of the sub-image A1.
the two-dimensional Spearman correlation between wavelet descriptors RDMs and the MEG RDMs in each time point. This results in 25 time courses representing the temporal neural traces of the wavelet descriptors in the human object recognition ( Figure 1B).

Statistical Testing
In order to estimate the significant time points of the time series, we performed non-parametric signed permutation statistical test (Pantazis et al., 2005;Nichols, 2012) broadly used in neuroimaging studies (Hayasaka and Nichols, 2004;Mirman et al., 2016;Mohsenzadeh et al., 2018). Permutation and bootstrap were done with a sample size (n = 16) equal to the number of subjects. We used 1,000 bootstrap samples. In each bootstrap sample, we chose 16 time series with replacement among all subject time courses and we estimated the significant time points and onset time. To estimate the significant time points and assess the statistical significance of the time series, we performed the sign permutation test. Since the time series carry the results of correlation between MEG pattern classification and Wavelet descriptors, the null hypothesis indicates no signal or dynamics in the time series. To do the sign permutation test, we randomly permuted the labels of MEG data (conditions labels). Therefore, the subjects' responses were randomly multiplied by +1 or −1. The operations (permutation and bootstrapping with sample size 16) were repeated for 1,000 times which led us to provide a one-dimensional p-value statistical map. Then, we performed the cluster correction test to regularize the error across all the time points. The cluster definition threshold was set to 0.05. According to this test, if the size of connected time points (clusters) was greater than the threshold these time points were considered significant. We used bootstrapping to test and estimate the peak and onset latencies of the time courses. The time series for each subject were bootstrapped and averaged across the subjects 1,000 times. The standard error of measurement (SEM) and 95% confidence intervals are defined based on the distribution of obtained Peaks or onsets of all bootstrap samples.

RESULTS AND DISCUSSION
We explored the temporal relation between the brain activity and wavelet representations of real-world images. The signature results of wavelet components in the human brain during an object vision task are presented separately for five levels of wavelet decomposition. These components consist of approximation and sparse detail sub-bands of two dimensional Haar wavelet transform. We also investigated the neural information encoded in the visual system regarding wavelet descriptors of images.

Representational Similarity Comparison of Brain Data With Wavelet Approximation and Detail
First, we investigate how the wavelet representation (approximation and details) of images at different levels of composition are encoded in neural data. With this aim, we use multivariate pattern analysis on MEG data to compute the neural representations and calculate the wavelet descriptors of images and create their corresponding RDM. One example of the approximation (A1) and detail descriptors including horizontal (H1), vertical (V1), diagonal (D1) and 'all details' are shown from left to right on Figure 3A. Figures 3B,C shows the time courses generated from performing the Spearman correlation between MEG RDM at each time point and wavelet approximation as well as details RDMs. Color-coded solid lines above the time courses mark the significant time points for each curve. All the significant time points are found with non-parametric permutation statistical tests using cluster defining threshold P < 0.05, and corrected significance level P < 0.05 (N=16). As can be seen in Figure 3C, the curves representing the neuro-dynamics of wavelet approximation of all five levels maintain the same trends and time courses. With regard to the significant time points, we found that the neural signatures of wavelet approximation coefficients are sustained, while they are transient for details descriptors. Research on the spatiotemporal dynamics of object recognition in the human brain (Cichy et al., 2014) demonstrates that neuronal processing of objects can be both transient and persistent. As Figure 3A illustrates, all-detail descriptor which aggregates the detail descriptors in three orientations (horizontal, vertical and diagonal) captures the overall shape and some categorical information of objects. For this reason, it can act as a discriminant feature in the human object recognition, but it still lacks other low, mid and high-level features. However, wavelet approximation descriptors contain a broader range of high-level semantics of images which may be processed and maintained in the later visual processing areas of the human brain such as Inferior-temporal (IT), Fusiform, and Parahippocampal cortex (PHC).

Comparison of Peak Latencies in Different Levels of Wavelet Approximation Time Courses
Given sustained neural information of wavelet approximation time courses at all five levels, we further asked how categorical information is represented in different levels. To understand that, we estimated the peak latency of these time courses using the signed permutation test (N = 16; p < 0.05). We found that, as the level of wavelet approximation coefficients increases, peak latencies of its corresponding time course occurs later. This can be explained by the fact that the sub-band images extracted from higher levels of wavelet decomposition contain less sparsity and signify a denser representation of the stimuli which accentuate semantic information decoded in the stimuli. The first and second columns of Table 1 report the peak latency of wavelet approximation time courses with 95% confidence interval and (mean ± SEM) in which SEM represents the standard error of measurement. The third column reports the onset time. Figures 4A,B illustrates the peak latencies of wavelet approximation coefficient at different levels of decomposition with error bar and box plot using bootstrap test. To remove the outliers, the top and bottom 5% of data points have been discarded. Figure 5 displays the histograms of peak latencies at different levels using 1,000 bootstrap samples and the red curves show the normal density functions fitted on the histograms. Since the object recognition and categorization in the human brain occurs earlier than 150 ms after the stimulus onset Thorpe et al. (1996) Isik et al. (2013), we kept the first group of data which shows the histogram of peak latency happening earlier or equal to 150 ms and we discarded the data points with peak latency >150 ms which contain a small portion of all the data points. The mean peak latencies of the distributions fitted on histograms in Figure 5 are 108, 110,123, 132, 136 ms for level 1 to level 5, respectively. Similarly, as Figures 4A,B confirms, the peak latencies of wavelet approximation time-courses increase with level. This suggests that approximation descriptors at higher levels carry more categorical and semantic visual information required to be processed in later visual processing areas across the ventral stream pathway.

Representational Similarity Analysis Infers the Oblique Effect for Wavelet Details Descriptors
To further study the signature of wavelet details coefficients of different orientations, we calculated the wavelet details RDM for each orientation separately.  Figure 7F represents the summation of all significant time point for different orientations. As shown, the overall number of significant time points of the horizontal and vertical wavelet time courses is noticeably greater than the diagonal wavelet time course. This suggests that vertical and horizontal details are represented stronger in the human brain.
The majority of experimental researchers have designed or used the specific stimuli such as grating stimuli to study the oblique effect. employed MVPA to investigate the decoding of various orientations with a set of six different grating stimuli. Their results confirmed the oblique effect in the human visual system. In our study, we inferred the oblique effect on the human visual system by estimating the wavelet details descriptors of the real-world images as representatives of orientations. This inference is based on the number of significant time points which can be interpreted as the time points in which there is a significant and meaningful correlation of MEG data and its corresponding wavelet orientation details.

CONCLUSION
We explored the neuro-dynamic of wavelet approximation and details sub-bands in human vision. Although orientation-specific stimuli were not used in this study, our results revealed that MVPA is a well-suited approach for inference of the implicit oblique effect in the human visual system. Furthermore, we found that while the signature of wavelet details descriptors was transient, there was a sustained significant correlation between the approximation descriptors and neural data. The result of our study on the time course of wavelet approximation coefficient indicated that the peak latencies of correlation time series increased with the approximation level. This effect implies that decreasing the size of images and increasing the level of wavelet approximation coefficients causes a reduction of sparsity and highlights semantic and categorical information of objects in the human visual system.

ETHICS STATEMENT
The stimuli and MEG data of this study are provided by Cichy et al. (2014). Their study was in compliance with the Declaration of Helsinki and approved by the Institutional Review Board of the Massachusetts Institute of Technology.

AUTHOR CONTRIBUTIONS
EH analyzed the MEG data and stimuli and wrote the manuscript. AT assisted with the interpretation of results and advised on analysis and writing process. Trans. Pattern Anal. Mach. Intell. 19, 910-916. doi: 10.1109/34.6 08294 Vidal-Naquet, M., andUllman, S. (2003) Conflict of Interest Statement: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Copyright © 2019 Hatamimajoumerd and Talebpour. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.