Statistical Analysis of Coding for Molecular Properties in the Olfactory Bulb

The relationship between molecular properties of odorants and neural activities is arguably one of the most important issues in olfaction and the rules governing this relationship are still not clear. In the olfactory bulb (OB), glomeruli relay olfactory information to second-order neurons which in turn project to cortical areas. We investigate relevance of odorant properties, spatial localization of glomerular coding sites, and size of coding zones in a dataset of [14C] 2-deoxyglucose images of glomeruli over the entire OB of the rat. We relate molecular properties to activation of glomeruli in the OB using a non-parametric statistical test and a support-vector machine classification study. Our method permits to systematically map the topographic representation of various classes of odorants in the OB. Our results suggest many localized coding sites for particular molecular properties and some molecular properties that could form the basis for a spatial map of olfactory information. We found that alkynes, alkanes, alkenes, and amines affect activation maps very strongly as compared to other properties and that amines, sulfur-containing compounds, and alkynes have small zones and high relevance to activation changes, while aromatics, alkanes, and carboxylics acid recruit very big zones in the dataset. Results suggest a local spatial encoding for molecular properties.

resampling technique for localization of receptive fields for molecular properties. Additionally support-vector machine (SVM) classification was utilized as a non-probabilistic binary linear classifier to test relevance of molecular features. A SVM was chosen because its superior performance compared to other classification techniques Kotsiantis et al. (2007) makes it an algorithm of reference. Combining the two approaches our results pertain to relevance of odorant properties, and localization and size of coding zones. We will explain first the data and methods, then present results, and discuss them in the broader context of olfactory coding.

data
In this study we used a set of 2-deoxyglucose images of glomeruli over the entire bulb of the rat Johnson et al. (2002;http://gara.bio. uci.edu/index.jsp), which were collected by Michael Leon's group over the course of many years. In the experiments, they stimulated unanesthetized and freely respiring animals with odorants over a delivery system. The animals were then sectioned and imaged. Each of these images corresponds to glomerular responses to one particular compound. According to Johnson et al. (2006), the method is capable of resolution down to a single glomerulus. Leon and Johnson (2009) took averages over left and right bulbs and averaged over arrays from all animals exposed to the same concentration of the same compound (around three to five animals for each combination). An orientation for these images, annotated with anatomical terms of location, can be seen in Figure 1.
We used descriptive information, also provided by lab of Michael Leon, which was matched by CAS number or chemical name to images. Ten images had to be discarded because of missing information about the compound. We extracted about 200 descriptors in total, which include physicochemical odorant properties as well as perceptual properties ascribed to the sensed odorant. Properties are of continuous and binary type. Continuous properties include molecular length, height, and weight. To give some examples of binary properties, binary properties concern cyclization (whether an odorant is alicyclic, aromatic, polycyclic, or heterocyclic), bond saturation (whether an odorant is alkene, alkane, or alkyne), and functional groups (whether an odorant is ester or lactone, amine, carboxylic acid, contains sulfur, contains halogen, is a ketone, alcohol, or phenol). Perceptual properties are all binary and include flavors such as sweet, camphoraceous, floral, and minty. For some properties there were many associated activity maps, for some very few. We discarded binary properties from analysis where representation was too skewed (less than four images in the infrequent category).
In our pre-processing of images, we mean-centered all pixels in order to get maps that show activation at each pixel relative to its overall pattern, and we normalized deviations to standard unit to compensate for differences in absolute pixel intensities. We started with 472 maps, of these some represented responses to identical odorants in different concentrations. For some images we did not know the ligand concentration and we had too few responses of the same ligand to do an analysis of dose-responses. We observed generally that glomerular recruitment increases with concentrations and that maps become unspecific at high doses. We eliminated by visual inspection five highly saturated maps, where most or all of the glomeruli were activated, and took means over response maps corresponding to the same odorant in different concentrations (222 maps in on average 3 concentrations). This selection left us with 308 point maps, where each corresponds to activation responses to a distinct odorant. Missing values, caused by loss of tissue on the knife during cryosectioning, others due to loss of tissue during removal of the bulbs from the skull using microdissecting scissors, were ignored in the analysis, which left us with 1834 pixels.

Methods 2.3 localIzatIon of codIng zones
For each molecular property, we tested statistically whether a pixel showed significant differences with respect to the property. For binary properties we compared activations on images, where a property was given, with activations on images, where property was not given using a non-parametric statistical test. The Wilcoxon ranked-sum test (also called the Mann-Whitney U test) assesses whether two samples come from the same distribution (null hypothesis). For some properties, we had only very few maps that corresponded to them. To account for statistical variations in these distributions we applied a bootstrap Efron (1982) resampling procedure for all tests to estimate p-values of the statistical test. In bootstrapping, a statistical analysis is repeatedly applied to subpopulations of the same size, generated by sampling from the original population with replacement. Bootstrap methods can be used for hypothesis tests and for regression analysis and allow the estimation of distributions of almost any statistic where only few samples are available. Using the Wilcoxon ranked-sum test is analogous to applying the students' t-test on the data after ranking over the combined samples and using the bootstrapped variant has the advantage of not  iterations we randomly sampled half of the activation maps as training set and took the other half as test. We distinguished between two experimental conditions: 1. best points -classification using most representative points, and 2. random baseline -classification using randomly sampled points.
For the first experimental condition, for each property, we sorted points in descending order by their relevance with respect to the property (p-values from Wilcoxon rank-sum test) and then classified taking the best n points, with n ∈ N = {1, 5, …, 30, 45, 50, 60, …, 150, 200, 300, …, 1700, 1834} (36 steps). As a random baseline, for each property, we took the same intervals N, but randomly sampled points. We averaged over 250 random subsamples of points for each interval. We used an in-house linear SVM classifier implemented in Matlab (which we found more robust than other SVM implementations, possibly because of Mathwork's quadprog function). We use the area under the ROC curve (AUC, cf. Bradley, 1997) as performance criterion, which measures the fraction of true positives against the fraction of false positives. It has the advantage to be unbiased by skewed class distributions, which are a particular problem in our data set. An example of such an experimental run for the aromatic property is shown in Figure 4. Figure 2 shows loci of coding zones for the 13 molecular properties. Compare Figure 1 for anatomical terms of locations in a ventralcentered chart for the glomerular layer of the rat OB.

localIzatIon of codIng zones
For the other chemical properties displayed in Figure 2 we grouped properties into molecular bonds, cyclization, and functional groups, and then show mapping for ranges of molecular length and carbon number. For space constraints we excluded many maps, such as for surface area, water solubility, and others. Colors in figures serve to distinguish zones, which are activated preferentially for a specific combination of several binary properties. For space efficiency, legends refer to numbers which are explained in the figure caption. In Figure 2A you can see the coding zone for the alkane property as a demonstration of results of our statistical method of loci determination for a single property. Red indicates pixels that were significantly higher in the presence of alkane.
We created a factorial code so that the color code accounts for all combinations of coding for properties. For n properties, numbers from 0 to n − 1 were assigned to each property. For each pixel, a binary vector b prop ∈ {0, 1} n expresses whether a property was found to be significant or not. The ith position in this vector stands for property i. Each subset was assigned its distinct color. Factorial maps for all properties of cyclization and functional groups, respectively, were too crowded and therefore each are broken into two in order to be better intelligible.
In Figure 2B we show locations responsive to molecular bond properties alkane, alkene, alkyne, and combinations. Figure 2C shows aromatic and alicyclic. There are seven kinds of zones that mark codes for different combinations of properties alkane, alkene, and alkyne. Zones 1, 2, and 4 code for exclusively one of these assuming normality, more robustness to outliers, and allows the two samples to be of arbitrary (unequal) sizes. We compared the activations given the binary property, A P , against the activations not given the binary property, A¬ P .
At each iteration of the bootstrap, we randomly sampled from the two distributions with replacement before applying the Wilcoxon rank-sum test. The resulting distribution of p-values was log-normal and we took the medians of p-values as bootstrap statistic and use these median p-values for subsequent analysis (cf. Limpert et al., 2001). As estimation of the bootstrap error, we took the interquartile range of the sampled p-values. There was a very high and very significant positive Pearson correlation between error and p-values (r = 0.77, p = 0.001). About 94% of points below significance level 0.05 had an associated error below 0.1. We only took these points into account. We will say that points are coding for a (binary) property if the null hypothesis could be rejected at the 5% significance level. More formally, the coding of a property can be expressed as: coding bigger : .
where p stands for the bootstrapped p-value of the statistical test and bigger is the bootstrap statistic of m m The obtained value would indicate how strong odor-induced activity is associated with a property (or its absence for negative values). If statistical significance is below a threshold, the value is set to 0. We only took into account values above 0.
For continuous properties the procedure was more involved. We discretized properties by grouping their values into bins, taking bin numbers as first guess from Sturges' formula (cf. Wand, 1997) then adjusting by such that in each bin there were at least roughly 5% of activation maps. We then applied the procedure with bootstrap and Wilcoxon rank-sum test for differences between activations in response to property values in a one-against-all fashion, i.e. activations in a particular bin versus activations in response to values out of bin.

Size of coding zones
In order to determine the size of coding area for a property, we took the number of points that responded significantly different when a property was given as when the property was absent. Skewed distributions for some properties could have an impact on how many points are found to be significantly related to a property. Therefore it is important to note that for the properties under consideration, data availability (number of images corresponding to presented molecular properties), and size of coding zone showed no significant Pearson correlation (r = 0.33, p = 0.27). We took into account 13 binary molecular properties, where at least four images were available.

classIfIcatIon
We take the classification performance as measure of structureactivation relationship between activations of glomeruli and odorant features and use it to quantify relevance of molecular properties to glomerular coding. We classified using a linear SVM (Vapnik, 1995) with glomerular activations as input vector and each binary property (present vs. not present) as binary target. In each of 10  Table 1 shows size of coding zones as estimated by the procedure explained above. From the table it can be seen that aromatic is broadly coded by glomerular activations. Nearly 60% of points showed differences significant at the 5% level. Alkane covers the second biggest area with about 40% of points. Carboxylic acid properties. Zones 3 encodes alkane and alkene, zone 5 alkane and alkyne, zone 6 alkene and alkyne, and finally zone 7 codes for all of the three properties. Cyclization properties, especially alicyclic, have a moderate but highly significant Pearson correlation (r = 0.33, p = 2.05 ×10 −9 between alicyclic and polycyclic, r = 0.36, p = 6.72 ×10 −11 between alicyclic and heterocyclic, and r = 0.21, p = 1.95 ×10 −4 between aromatic and heterocyclic). As can be seen in Figures 2C,D, properties aromatic and heterocyclic and properties alicyclic and polycyclic, respectively, project to very similar bulbar regions.

Size of coding zones
Functional groups did not have such a high covariance. Figure 2E highlights responsive zones for functional group properties amine, ketone, and alcohol-phenol. Figure 2F gives loci for the functional group properties ester + lactone (we put these two properties together), carboxylic acid, and sulfur-containing compound.
We show additional properties in Figure 3. In Figure 3A you can see areas associated with molecular length across the OB. Figure 3B shows carbon numbers. Of the 13 compared properties, sulfur-containing compound, alkyne, alkane, alkene, and amine perform close to ceiling. Classifications of carboxylic acid, aromatic, and ketone also shows good performances. Polycyclic, ester-lactone, the functional group alcohol + phenol, and cyclization properties heterocyclic and alicyclic give mediocre performances.

dIscussIon
We present a statistical procedure for decoding of continuous behavioral properties to investigate coding at the glomerular level of the OB. It consisted of a univariate statistical test, the Wilcoxon rank-sum test within a bootstrap wrapper, and as a multivariate extension, a SVM classification procedure. This brings the advantage systematic and quantitative results in the absence of assumptions of distribution parameter and allows comparison of vectors of unequal lengths. By our method we could map odorant properties to clustered zones and we obtained results for relevance of these properties.
Partly because of space constraints, we only show maps of only several properties in this paper. Figure 2 illustrates the localization of coding zones for properties pertaining to important coding dimensions (as discussed in . Please refer to Figure 1 for anatomical terms of location in the bulb. It has to be mentioned that we did not control for correlation between molecular properties in our study. Long molecules are for example less water soluble. Areas for molecular length correlated highly with respect to co-activation with molecular elongation, surface area, and water solubility (not shown). It is known that carbon number is associated with molecular length, volume, hydrophobicity, among other properties (cf. . In many studies only a small set of odorants is examined at differing concentration levels. A comparison across many studies is therefore not always straightforward, however we think that our results are generally in good agreement with conclusions of several studies, including . It was shown Meister and Bonhoeffer (2001), Rubin and Katz (1999) that many glomeruli are sharply tuned for a small range of chain lengths.  conclude there is a progression with increasing carbon number from medial and lateral areas into ventral areas. Our results show (cf. Figure 3B) mostly medial areas for up to seven carbon atoms and ventral areas for 14-21, and ventral-medial areas for molecules of 8-13 carbon atoms. Leon and Johnson (2009) also describe areas for molecular length consistent with our results (cf. Figure 3A). They indicated that zones in dorsal areas were active for shorter chains while longer molecules activated zones more ventrally. Our results confirm both findings.
Our results for carboxylic acids (cf. Figure 2F) coincide with  in locating responsive glomeruli in anterior medial/lateral areas. We also confirm Leon and Johnson's (2009) result in placing regions for alcohols and phenols in lateral/medial areas, more caudal than regions for carboxylic acids (cf. Figure 2E). We found also areas for esters overlapping with their placement of aliphatic esters (cf. Figure 2F) located in some central medial and lateral areas just next to alcohols. We found another area for esters in caudal areas of lateral and medial bulb.  placed a zone for aromatics with O groups and and ketone are coded by about a third of all points. Coding zones for properties alkene, alicyclic, and heterocyclic extend to between about 20 and 30%. For ester + lactone, alkyne, and alcohol + phenol coding zones we measured between 10 and 16% of total. Properties polycyclic, sulfur-containing compound and amine recruit the smallest zones of compared properties with about 7, 4, and 0.6%, respectively. Table 2 ranks properties according to the classification performance (AUC) of the linear SVM. The classification performance is indicated in the second column. It is important to note that Pearson correlation between classification performance and availability of data is low and insignificant (r = 0.2, p = 0.5). Amine 10 0.5

classIfIcatIon
The table shows for each property the number of points found to be significantly correlated at 5% significance level. The second column gives the absolute size in pixels and the third the size relative to all available pixels. impact on olfactory processing in rats. Doleman et al. (1998) suggested that increased olfactory sensitivity for alkylamines or alkylthiols (in humans) as compared with alkanes or alcohols could be accounted for by evolutionary adaptation for detecting decaying food and toxic gases, because amines are associated with harmful putrid food. Activations are very distinct with respect to whether an odorant contained a sulfur functional group or not. Bond saturation, indicative of the reactiveness of compounds, seems also to affect coding very strongly, as we can see in the high performance of alkyne, alkane, and alkene. Carboxylic acid another functional group and aromatic, a cyclization property, still seemed to be quite important. Our results confirm that cyclization, bond saturation, and some functional groups are very important. This is in line with , who proposed as important dimensions of molecular properties cyclization, carbon numbers, bond saturation, branching, functional groups, and substitution position. Our results also partly confirm Yoshida and Mori (2007) who proposed 14 primary odorant categories which could serve to enhance category-profile selectivity. These properties were sulfides, alcohols, methoxypyrazines, 6-carbon and 9-carbon green-odor compounds, aldehydes, ketones, isothiocyanates, terpene hydrocarbons, esters, terpene alcohols, alkylamines, acids, lactones, and phenol and its derivatives. We found that, as for the properties included in this study, sulfides, alcohols-phenol, ketones, esterlactone, amines, performance was quite good, however our results indicate that other properties such as whether odorants contained a carboxylic acid group or their bond saturation could also be very important.

olfactory codIng
There are arguments for spatial, temporal, and spatio-temporal coding in the OB (cf. Leon and Johnson, 2009). Haddad et al. (2010) argued that for reasons of robustness, speed, and in the light of experimental evidence, it is plausible that global and local coding schemes could work together. Our findings confirm previous studies which suggest a labeled-line coding. A hypothesis could be that ketones in a dorsal region and one for aromatic hydrocarbons in a dorsal caudal region. We found a large region for aromatics (cf. Figure 2C), mostly dorsal but also medial and lateral, and a region for ketones (cf. Figure 2F) in similar dorsal areas.

sIze of codIng zone
By extension of our statistical procedure we compared the size of coding zones. We are not aware of other attempts in the literature to quantify sizes of coding areas. It should be cautioned that these results should be interpreted more qualitatively than quantitatively. The thresholding of p-values at certain significance values (here 5%) means that effects of concentration and relevance cannot be completely separated, however presented results could serve to group properties roughly by their coding zones. Results of size go in hand with localization maps shown above to indicate that some encoding of certain properties are specific to some spatial areas.
Size estimates of investigated properties differed largely. The properties for which we found the smallest coding zones are amine and sulfur-containing compound, with roughly 0.5 and 4.1% of recruited area. There are properties which recruit bigger zones and properties that recruit smaller zones.
Larger coding zones could mean that properties are broadly sensed by a range of ORs. In turn, it could be hypothesized that properties with small coding zones could be more directly related to the proposed odotopes, especially so, properties that have high relevance to coding. From the results in Tables 1 and 2, amine, sulfur-containing compound, and alkyne could be such candidates.

relevance of ProPertIes
We classified molecular properties by activation of glomerular activations in order to estimate their impact on early olfactory coding. The logic behind is that properties that greatly change activations at the OB level should be easier to classify. Knowing the relevance of molecular properties could provide insight into early coding of chemical information and provide vital clues for discerning which properties are functional in determining the degree of interaction between an OR and odorant molecules. We define relevance as the best classification performance from either most representative points or random baseline whichever was higher.
For some properties, the performance curve from representative points was below baseline at some intervals (compare Figure 4). We think this is due to the imperfect definition of most representative points. The performance of the classification of amine found an early peak at 200 points, stayed at high levels until 600 points before leveling off drastically. The early peak could be explained by a very small area corresponding to amine (compare Table 1) and in fact activation maps for amine looked very different from each other. Taking more points would not provide more information, rather noise, to the classification. We also think that the SVM had difficulties because of only very few data samples corresponding to amine (4 of 308 maps).
Our classification results indicate that there are some properties that affect odor coding on the OB level very strongly. Most relevant properties are alkyne, alkane, alkene, and amine. From our study, we speculate that these properties could have a very strong alkyne, alkane, alkene, and amine causes big changes in these activation maps as compared to other properties. Properties amine, sulfur-containing compound, and alkyne have small zones and high relevance to activation changes. Properties aromatic, alkane, and carboxylic acid showed biggest zones. Together, results suggest a local spatial encoding for molecular properties.

acknowledgMents
The first author was supported by a grant from the federal state government of Catalonia (formació de personal investigador, FI) during most of the study. He gratefully acknowledges the kind help by Bernhard Kaplan in proofreading previous drafts and making many corrections, by Tony Lindeberg, who gave advice during the writing of this paper, and by Timothy Pearce, who made many insightful and critical comments. Both reviewers made many helpful and thoughtful comments and pushed the writing of this paper much further. We further thank Miquel Tarzan for implementing the SVM, which was used in the study. Daniel Calvo assisted in the selection of appropriate chemical properties for this study. The authors thank the group around Michael Leon and Brett Johnson at the University of California at Irvine for collecting and providing the data. distances between spatial zone have behavioral correspondences, such as in the case of carbon chain length, which is related to discrimination ability in human subjects Laska and Teubner (1999).
It could be speculated that the observed clustering of properties constitutes an instance of the minimization of wiring length in cortical networks for local processing of feature combinations (cf. Chen et al., 2006). Yaksi and Wilson (2010) provides evidence that local circuity in the antennal lobe, in insects the functional correlate of the OB, could serve for gain control by both contrast enhancement and increase of sensitivity. Along this line, higher recruitment of area could improve signal-to-noise ratio which allows a decrease of detection thresholds, or it could indicate a further specialization of the recruited area. Another hypothesis would be that a smaller area could indicated specialized local feature detectors. From that perspective, for example, our result that both amines and sulfurcontaining compounds are represented in a small area of glomeruli could be object of more thorough investigation. In general, further electrophysiological studies could help to corroborate our findings.

conclusIon
We find clustered glomerular representations for many molecular properties in a 2-deoxyglucose autoradiography data set of the rat OB. Of the compared molecular properties, presence of