An encoding framework for binarized images using hyperdimensional computing

Introduction Hyperdimensional Computing (HDC) is a brain-inspired and lightweight machine learning method. It has received significant attention in the literature as a candidate to be applied in the wearable Internet of Things, near-sensor artificial intelligence applications, and on-device processing. HDC is computationally less complex than traditional deep learning algorithms and typically achieves moderate to good classification performance. A key aspect that determines the performance of HDC is encoding the input data to the hyperdimensional (HD) space. Methods This article proposes a novel lightweight approach relying only on native HD arithmetic vector operations to encode binarized images that preserves the similarity of patterns at nearby locations by using point of interest selection and local linear mapping. Results The method reaches an accuracy of 97.92% on the test set for the MNIST data set and 84.62% for the Fashion-MNIST data set. Discussion These results outperform other studies using native HDC with different encoding approaches and are on par with more complex hybrid HDC models and lightweight binarized neural networks. The proposed encoding approach also demonstrates higher robustness to noise and blur compared to the baseline encoding.


Introduction
Because of the rising interest in the wearable internet of things (IoT), near-sensor artificial intelligence (AI) applications and on-device processing, there is considerable need for energy-efficient algorithms.Hyperdimensional computing (HDC) has been proposed in the literature as a brain-inspired, light-weight and energy-efficient method because it has the advantages of few data requirement [47], robustness to noise [14,54,47], low latency [47] and fast processing [47].HDC maps input data to a hyperdimensional (HD) space in which information is distributed across thousands of vector elements, mimicking the large number of neurons that store information in our brains.Since HDC uses simple HD arithmetic operations, it is computationally less complex than traditional deep learning (DL).HDC has already been used in several applications, such as speech recognition [13], human activity recogni-tion [19], hand gesture recognition [48,37,59], text classification [42], classification of medical images [25,52], character recognition [35], robotics [40], and time series classification [49].
A crucial aspect that determines the performance of HDC is the encoding of the input data to the HD space, which highly depends on the type of input data.To date, studies have clearly defined how text data [22], numeric data [13,19] and time-series data [48] can be encoded in a simple way using the HD arithmetic operations.However, what is still missing in the literature is a uniform framework to encode (binarized) images.Therefore, this article aims to propose a novel light-weight HD approach to encode binarized images relying only on native HD arithmetic vector operations.In this aspect, the current article brings forward the following novelties: 1. Local linear mapping is introduced as a novel mapping method for numeric data, whereby nearby numerical values are represented by similar HD vectors, and all other values by orthogonal HD vectors.In addition, we demonstrate its application for encoding positions in 2D images; 2. A parameterized framework for encoding binary images into HD vectors is defined which uses point of interest (POI) selection as a local feature extraction method and unifies existing approaches for native HD encoding of images; 3. The proposed framework is applied on benchmark data sets, reaching 97.35% classification accuracy on MNIST and 84.12% accuracy on Fashion-MNIST.
The remainder of the article is structured as follows.In the next section, the HDC model for classification is described.Afterwards, section 3 defines the local linear mapping for numeric data and illustrates its application to 2D position encoding.Section 4 provides an overview of encoding approaches for binarized images found in the literature and introduces our parameterized unified framework.The fifth section describes the performed experiments to test the proposed encoding framework, whereafter the results are presented and discussed in the sixth and seventh section, respectively.The last section presents the conclusions of the article.

Hyperdimensional Computing
HDC is a mathematical framework using HD vectors (i.e., vectors with very high dimension typically up to ten thousands, also called hypervectors (HVs)) and simple HD arithmetic vector operations to represent data.The focus of this article is on dense binary HVs (i.e., the elements are 0 or 1 with an equal probability of occurrence of both values) of dimension 10,000 [50].The analysis of data relies on the similarity between HVs which is calculated using the normalized Hamming distance between two binary HVs v 1 and v 21 : with s the similarity between v 1 and v 2 , D the dimensionality and h the Hamming distance between v 1 and v 2 : The HD arithmetic vector operations include: (a) bundling ⊕: B × H → B: (B, v) → B + v where B = N D and H = {0, 1} D (i.e., element-wise addition) after which the bundle B is binarized into the HV v with the majority rule [.] : B → H: B → v according to: with n the number of HVs bundled in B and rand(0, 1) means that the component v[d] is randomly assigned to 0 or 1 in the presence of ties; (c) permutation ρ: H → H (e.g., cyclic shift in binary HDC).
Figure 1 gives a schematic overview of the framework of HDC in which two main building blocks can be distinguished: an encoder and a classifier.The encoder is responsible for mapping the input to an HV.Typically, it maps each input value of a sample to an atomic HV that is stored in (continuous) item memories ((C)IM).This procedure is called mapping and will be explained in section 3.Then, different atomic HVs are combined to obtain one sample HV for each input.Commonly, an input sample f having n features is encoded with the so-called record-based encoding [12,29,43] as Figure 2: Each feature (j = 1...n) is assigned a random HV to represent the feature ID which is stored in an IM.Feature values are translated in HVs with a CIM that is created with linear mapping (see section 3.2) [23,48].Next, each feature ID HV v j is bound with the HV representing its value v f [j] .Finally, these ID-value bound pairs of all features are bundled together to form the sample bundle S by initializing and bundling each bound pair v f [j] ⊗ v j one at a time: The sample bundle S is then simply: For notation purposes, this iterative bundling (Equation 4-6) will be written in short as: Finally, the sample bundle is binarized into the HV s = [S] with the majority rule (Equation 3).Since the encoder is a crucial part of the system and a uniform framework to encode (binarized) images is still lacking in the literature, we propose a novel encoding framework (section 4).
3 Data mapping techniques

Orthogonal mapping
Orthogonal mapping assigns a randomly chosen atomic HV to each possible value present in the data.These random HVs are pseudo-orthogonal due to the high dimensionality which converges to exact orthogonality with increasing dimensionality [22].This type of mapping is suitable for nominal data where each value is independent from other values.

Linear mapping
In the case of ordinal or discrete data, there is a natural ordering of levels or values such that closer levels should be mapped to more similar HVs than levels further apart and similarity-preserving HVs are preferred for this type of data.Therefore, linear mapping of levels to atomic HVs is applied [23,48].Namely, the lowest level is assigned a random atomic HV, whereafter each level's atomic HV is obtained by flipping D/2 L−1 bits in the atomic HV of the previous level, where L is the number of levels (without flipping a bit that has already been flipped before).Similarly, continuous data can be mapped to HVs with linear mapping after being quantized into a predefined number of discrete levels.
As an example, Figure 3

Local linear mapping
Encoding numeric data with original linear mapping results in small differences between the HVs of two adjacent values when working with a relatively large number of levels, and even values that are far apart are always somewhat similar (s > 0.5).Therefore, we introduce local linear mapping which splits the range of values in S splits such that a smaller number of HVs (i.e., L−1 S + 1 HVs) is present in each split to which linear mapping can be applied.As such, the upper edge vector of a previous split is used to apply linear mapping in the following split.Consequently, two adjacent values within one split will have a larger difference in HVs (i.e., D/2 ((L−1)/S) different bits) compared to when applying original linear mapping to the whole range of values.Additionally, an HV will be similar to HVs within a certain range from the considered HV and dissimilar, thus approximately orthogonal, to all HVs further away from the considered HV (i.e., outside that certain range).As a result, small differences in values are emphasized and large differences are ignored.Note that local linear mapping with 1 split or L splits correspond to the original linear mapping and orthogonal mapping, respectively.Figure 4 illustrates the concept of the proposed local linear mapping with 4 splits and thus 6 vectors in one split since there are 21 levels (i.e., 21−1 4 + 1).In each of the four splits (e.g., between the edge vectors for values -100 and -50), original linear mapping is applied.Two adjacent values within one split will be highly similar; an HV will be similar to vectors at nearby positions to the left and right; an HV is orthogonal to vectors further to the left and right.Local linear mapping has some resemblance to a technique introduced by Rachkovskij et al. [45] and Neubert et al. [39] for encoding position in images, which concatenates orthogonal edge vectors to obtain the position vectors within one split.The ratio of concatenation depends on the distance of the considered pixel to both edge vectors.However, the decrease in similarity for pixels further away from the considered pixels is not as gradual as with the proposed local linear mapping.This is shown in Figure 5 which illustrates the difference in similarity between all pixels' position HV and the position HV of pixel at location (21,11) for an image of size 28x28.The position HVs are all encoded as v x ⊗ v y of which the x and y positions are mapped to vectors v x and v y using the different types of mapping: (1) orthogonal mapping, (2) linear mapping [23,48], (3) the concatenation approach of Rachkovskij et al. [45] and Neubert et al. [39] using 10 edge vectors and (4) our proposed local linear mapping using 9 splits and thus also 10 edge vectors.We believe that the decrease in similarity for local linear mapping in Figure 5d is more intuitive than for the concatenation approach [39,45] (Figure 5c).Furthermore, local linear mapping builds further on the concept of linear mapping which is commonly used in HDC encoding approaches.In this aspect, it is also similar in concept to float code, which makes the similarity decay local but builds further on thermometer code [45,7].

Encoding techniques for binary images 4.1 Related Work
Several ways to encode binarized images with HDC have been proposed in the literature and can be divided into two main categories: (1) native HDC, i.e., end-to-end use of native HD vector operations (from raw pixel to output) and (2) hybrid HDC, i.e., external feature extraction methods are used in combination with HDC.Table 1 gives an overview of the different encoding approaches which are discussed in the following section.Table 1: Summary of the already proposed approaches for the encoding of binarized images.The table includes a short description of the type of encoding, the references where the encoding is discussed and the formula to obtain the encoded HV of the image.The symbols used in this table are listed in the appendix (Table A1).
The native HDC encoding methods can be further divided into two categories depending on whether position is encoded while preserving similarity between nearby positions (i.e., linearly mapped) or not (i.e., orthogonally mapped).

(c) Combination of permutation and binding
In analogy to the n-gram encoding in language identification applications [46], Khaleghi et al. [18] apply a sliding window of length n to the image.The window is then encoded by binding all pixel value HV's which are permuted based on the position in the window, i.e., the first pixel value HV is not permuted, the second is permuted once, the third is twice permuted, etc.This could be seen as extracting local features from the image.To account for the global position of these features in the image, each window HV is bound with a random position HV.
The encoding approaches mentioned so far represent similar pixels at nearby positions by dissimilar HVs, because of the property of permutation that a permuted HV is dissimilar to its original, and because of orthogonal position HVs.Hence, these encoding approaches do not preserve similarity which might be crucial to solve an image classification task.
(B) Linearly mapped position vectors Gallant et al. [9], Kussul et al. [28], and Weiss et al. [53] apply linear mapping such that nearby x and y positions are represented by similar HVs.The image is then encoded using the binding operation for a 2D image, as mentioned in section 4.1.1(A,b).
An alternative approach to preserve similarity for nearby positions is proposed by Frady et al. [8], Komer et al. [26], and Voelker et al. [51] who make use of fractional binding.For this, two random HVs x and y are assigned to represent the x-and y-axis, respectively.The (x, y) position is then constructed as x x ⊗ y y where x x = x n=1 x, i.e., the HV x is repeatedly bound with itself x times.This bound pair representing the position is then bound with the pixel value HV v I bin [x,y] .However, this type of position encoding cannot be applied to binary HVs since all even (or odd) positions would be represented by the same HV.

Hybrid HDC
Instead of encoding the raw image by means of HD vector operations, external non-HD-based feature extraction methods are used.These approaches can be subdivided in two categories: (a) those that use the output layer of a neural network (NN) or cellular automata (CA) as single feature HV to represent the image [15,25,57,61]; and (b) those that use external methods (NN or other) to extract multiple features which are encoded via the record-based encoding (Figure 2) [29].

Proposed unified framework
Figure 6 gives an overview of the proposed approach to encode binarized images which can be divided into four steps: (1) binarization, (2) POI selection and patch creation around POIs, (3) patch vector encoding, and (4) image vector encoding.
(1) Binarization.As a first step, the pixel values of an input image I are binarized using a predefined binarization threshold T bin : (2) POI selection and patch creation around POIs. Point of interests (POIs) are selected as pixels with I bin [x, y] = 1.Thereafter, a square patch P of predefined size z is drawn around each POI (in Figure 6, z = 3).
(3) Patch vector encoding.Each pixel in the patch is encoded as the binding of three vectors: the HV representing its binary value P [x, y] (stored in IM , one random vector for value 0 and another random vector for value 1), the HV corresponding to its x position in the patch and the one for the y position in the patch.The x and y position HVs are stored in two separate CIMs (CIM x,z and CIM y,z ), both containing z vectors that are mapped with orthogonal mapping.The resulting patch vector for the POI with position (x,y) is then obtained by bundling all pixel vectors and binarizing the obtained bundle: for all (x, y) ∈ P. The encoding of patch vectors around POIs can be seen as extracting local features of the image in analogy to Kussul et al.
[27], Kussul et al. [30] and Curtidor et al. [4], but here only native HD arithmetic operations are used instead of relying on an NN-based feature extractor.
(4) Image vector encoding.After obtaining the patch vectors of all POIs, each individual patch vector is bound with the HVs representing the corresponding POI's x and y position in the original image I (stored in CIM x,w and CIM y,h ) to capture the global positional information of the extracted local features.The binarized bundling of all these patch vectors bound with its POI's position results in the image vector: The CIM x,w and CIM y,h are mapped with our proposed local linear mapping instead of original linear mapping (see section 3.3) to capture small dependencies in position while ignoring large ones.

Experiments
The abovementioned proposed approach to encode binarized images is tested on two known, publicly available data sets: (1) MNIST data set [31].This data set includes 70,000 28x28 gray scale images of ten different handwritten digits.
(2) Fashion-MNIST data set [55].This data set contains 7,000 28x28 gray scale images of fashion products for each of ten categories, i.e., 70,000 images in total.Both data sets are split in a training set of 60,000 images (6,000 for each class) and a test set of 10,000 images (1,000 for each class).The pixel values range from 0 to 255.

Local linear mapping
At first, the concept of local linear mapping is tested using pixel-wise encoding on the whole image, without using POI encoding.The image is thus encoded as: The number of splits S in the CIMs storing v x and v y is treated as a hyperparameter and tested to be equal to 1, 3, 5, 7, 9 and 28 of which the second from last is the maximal number of splits possible for a 28x28 image, since otherwise only 2 vectors would be in a particular split and thus will be orthogonal.Note again that using only 1 split corresponds to the traditional linear mapping and will be treated as the baseline HDC framework, and using 28 splits corresponds to orthogonal mapping.The images are binarized following Equation 8with the binarization threshold equal to zero (i.e., T bin = 0).

Proposed unified framework
In the second part of the experiments, the local linear mapping is applied together with the POI encoding.This encoding approach requires to determine the settings of two hyperparameters: the number of splits for local linear mapping S and the patch size z around each POI.As will be presented in more detail in section 6.1, an increase in performance is seen when number of splits S is increased from 1 to 9 in the first part of the experiments (section 5.1).Hence, only the cases S = 1, S = 9 and S = 28 will be tested in this second part of experiments since we believe that the same increasing trend between 1 and 9 will be seen as in the first part.The tested settings for the patch size are 3, 5 and 7.In summary, all possible combinations of the following settings of the two hyperparameters will be tested: S = {1, 9, 28} and z = {3, 5, 7}.The images are again binarized following Equation 8with the binarization threshold equal to zero (i.e., T bin = 0).

Evaluation
The different combinations of settings are tested by means of 10-fold cross validation (CV) on the training set.This means that the 60,000 training images are split in ten parts.The algorithm is trained on 54,000 images and validated on the remaining 6,000 images which is repeated ten times while each time taking a different set of 6,000 validation images.
The training procedure is performed iteratively for a maximum of 1,000 iterations while saving the classifier with the best accuracy.After every 100 iterations, it is evaluated whether this best training accuracy exceeds 99% accuracy.If this is the case, the training procedure is terminated and the classifier with the best accuracy is used on the validation set.The performance of the HDC classifier for each combination of hyperparameter settings is documented as the average validation accuracy over the ten runs of the 10-fold CV.As such, the combination of hyperparameter settings yielding the largest average validation accuracy is selected and used to train on the entire training set (i.e., all 60,000 images) and tested on the 10,000 test images for ten independent runs.Finally, the average test accuracy over these ten independent runs is calculated.

Robustness analysis
To test the robustness to noise and blur of the proposed encoding approach, the MNIST-C data set which is proposed as a robustness benchmark for computer vision by Mu et al. [38] is used.This data set includes the 60,000 training and 10,000 test images of the original MNIST data set [31] to which several different corruptions are applied, including shot noise, impulse noise, glass blur, motion blur and spatter which are of particular interest in the current article to test noise and blur robustness.The HDC model with the proposed encoding is trained on the original 60,000 training images (i.e., without corruptions) with the baseline setting of hyperparameters (S = 1 and no POI selection) and the setting yielding the best validation accuracy after 10-fold CV.Both trained HDC classifiers are then tested on the five selected corrupted test sets of 10,000 images for which a test accuracy averaged over ten independent runs is calculated.

Local linear mapping
The results of the experiments testing the effect of the number of splits in local linear mapping are presented in Table 2.The table includes the accuracy on the training set, the accuracy on the validation set and the number of iterations needed to reach the maximal accuracy on the training set, averaged over the ten folds of the 10-fold CV for both the MNIST and Fashion-MNIST data set.As mentioned previously, the number of splits equal to 1 (S = 1) is treated as our baseline since this does not use local linear mapping nor POI encoding.As such, the baseline average validation accuracy is 60.78% for MNIST and 62.65% for Fashion-MNIST.
An increase in performance is seen when increasing the number of splits used in local linear mapping from 1 to 9. The largest validation accuracy is 93.21% for MNIST for S = 9 and 80.98% for Fashion-MNIST for S = 28, which is an increase of 32.43% and 18.33%, respectively.In the case of MNIST, the classifier with orthogonal mapping (S = 28) reaches an accuracy that lies inbetween the baseline and largest obtained accuracy, while this setting yields the highest accuracy for Fashion-MNIST.

Proposed unified framework
Table 3 includes the results showing the effect of the hyperparameters (i.e., the number of splits S in local linear mapping and the patch size z in POI encoding) for our proposed encoding approach.The table again includes the accuracy on the training set, the accuracy on the validation set and the number of iterations needed to reach the maximal accuracy on the training set, averaged over the ten folds of the 10-fold CV for both the MNIST and Fashion-MNIST data set.
Similar to the previous section, the validation accuracy is much larger in the case of S = 9 compared to S = 1, i.e., 96.48% -97.05% versus 78.22% -91.33% for MNIST and 84.30% -85.30% versus 66.70% -77.54% for Fashion-MNIST.This is also seen for S = 28 which shows accuracies in the range 93.27% -94.41% for MNIST and 79.60% -83.98% for Fashion-MNIST.An increase in performance is also seen with increasing patch size z.
The best achieved validation accuracy is 97.05% for MNIST with S = 9 and z = 7 and 85.30% for Fashion-MNIST with S = 9 and z = 5.This corresponds to an increase in performance of 36.27% for MNIST and 22.65% for Fashion-MNIST compared to their baseline accuracy (S = 1 in Table 2).These settings for the two hyperparameters yielding the best validation accuracy are used to test the HDC classifier on the test set in the next section.

Evaluation on the test set
Table 4 shows the results obtained when setting the hyperparameters to the values yielding the best validation accuracy obtained in previous section.The table shows the accuracy on the entire training set, the accuracy on the unseen test set and the number of iterations needed to obtain the best training accuracy, averaged over ten independent runs.An average accuracy of 97.35% is reached on the test set of MNIST.For the Fashion-MNIST data set, an average test accuracy of 84.12% is obtained.

Robustness analysis
Table 5 contains the results obtained during the analysis of robustness to noise and blur.The table includes the accuracy on the original and five selected corrupted test sets, averaged over ten independent runs for the MNIST-C data set with the hyperparameters set to the baseline setting (S = 1 and no POI selection) and the setting yielding the best validation accuracy with 10-fold CV (S = 9 and z = 7).The last row of the table contains the average test accuracy across all five corrupted test sets.As such, it is seen that the best hyperparameters setting achieves an average test accuracy of 72.58%, which is an increase of 39.16% compared to the baseline setting which achieves 33.42% average test accuracy.

Analysis of results
The results in Table 2 for pixel-wise encoding show that the proposed local linear mapping for position encoding outperforms linear mapping.More specifically, there is an increase in performance with increasing number of splits used in local linear mapping.This interesting finding indicates the importance of discriminating better smaller differences in position in the image instead of large differences.This is a result of the splits in local linear mapping that represents two positions that are far apart with orthogonal HVs, and only HVs of close positions are similar.By contrast, in linear mapping, the HVs of both close and far positions have a certain degree of similarity.
Another finding that stands out from the results reported earlier is a remarkable increase in perfor- mance when encoding patches around POIs (Table 3) compared to pixel-wise encoding (Table 2).Several factors could explain this observation.Firstly, background pixels are ignored with POI encoding, limiting unnecessary information.Secondly, local features are extracted around each POI such that the local neighborhood of each POI is taken into account.
In addition, employing local linear mapping to encode the global position of POIs in the image improves the performance compared to using linear mapping (Table 3).This finding is in line with the results obtained in Table 2 and can be explained in a similar way as done above.
Table 5: Accuracy (%) on the original and five selected corrupted test sets, averaged over ten independent runs for the MNIST-C data set with the baseline hyperparameters (S = 1 and no POI selection) and the best hyperparameters (S = 9 and z = 7).The last row contains the average test accuracy across all five corrupted test sets for each setting of hyperparameters.Data are mean (± standard deviation).Finally, the results of the robustness analysis indicate that the proposed encoding approach after hyperparameter selection shows a higher robustness to noise and blur than the baseline HDC encoding approach (Table 5 and Section 7.3).

7.2
Comparison to the state-of-the-art

MNIST data set
Table 6 provides a summary for the comparison of our obtained result for MNIST (i.e., 97.35%) with other studies found in the literature.
The proposed approach of POI encoding with local linear mapping outperforms all methods categorized in Native HDC.This includes the methods applying the permutation operation to encode position of pixels in the flattened image (section 4.1.1(A,a)),i.e., Manabat et al. [35] and Hassan et al. [10] report an accuracy of 79.87% and 86%, respectively.
Our obtained result for MNIST is also better compared to several studies using the binding operation for position encoding in the flattened image (section 4. In addition, the n-gram-based encoding method to extract local features by Khaleghi et al. [18] reaches an accuracy of 94.0% which we outperform by using local linear mapping instead of orthogonal mapping to encode global positional information.
Hernández-Cano et al. [11] propose OnlineHD that is able to increase their baseline performance of 91% to 97.5%, which is slightly higher than our obtained accuracy.In OnlineHD, the baseline HDC training procedure is extended by updating the HDC model depending on how similar a sample is to the existing model.As such, the training procedure becomes more complex due to floating-point multiplications.
Other studies use the HDC framework in combination with additional non-HD methods (Hybrid HDC, section 4.1.2),such as elementary CA which is used to derive the high-dimensional vector by Karvonen et al. [15] resulting in an accuracy of 74.06%.reaches a slightly higher accuracy of 98.09%.With only the latter achieving a slightly higher accuracy than ours, we can conclude that our proposed binary, native HDC method using local linear mapping and POI encoding achieves comparable results with these more complex multi-bit HDC methods.

Fashion-MNIST data set
Table 7 provides a summary for the comparison of our obtained result for Fashion-MNIST (i.e., 84.12%) with other studies found in the literature.
There are not as many studies available for the Fashion-MNIST data set as for MNIST.Duan et al. [5] and Duan et al. [6] report an accuracy of 79.24% and 80.26% for baseline HDC.Using hybrid HDC methods, Yu et al. [58] report an accuracy of 84.0% when using RFF and reach 87.4% using more complex elements in the HVs.Duan et al. [5] and Duan et al. [6] reach a higher accuracy of 85.47% and 87.11% by mapping the HDC model to an equivalent (B)NN.We can conclude that our proposed HDC method outperforms the native HDC methods but achieves a lower accuracy than the hybrid and multi-bit HDC methods.

Robustness analysis
After selecting the hyperparameters yielding the best validation accuracy with 10-fold CV, the proposed encoding approach is more robust to images corrupted with noise and blur compared to the baseline encoding approach (Table 5).Especially for the shot noise and impulse noise corruption, the average test accuracy is fairly equivalent to the average test accuracy achieved on non-corrupted images.
For spatter, the average test accuracy is slightly dropped but the proposed approach is still able to identify around 81% of the test images accurately.The average test accuracy drops the most for the glass blur and motion blur corruption where the proposed approach is able to classify respectively 55.84% and 38.46% of the images correctly.Still, this is an improvement of 36.97% for glass blur and 26.7% for motion blur compared to the baseline HDC encoding approach such that it can be concluded that the HDC classifier with our proposed encoding approach after hyperparameter selection has a high robustness to noise and blur with an average accuracy of 72.58% across five different corrupted test sets.

Future research
As future work, we envisage evaluating and extending the proposed encoding approach for application to gray scale and color images, investigating the use of hierarchical (multi-layer) patches with HDC encoding and further extensions of the local linear mapping concept for position encoding.
It could also be analysed how the HDC framework can be made even more robust to noise and corruptions such as glass blur and motion blur.

Conclusion
A novel light-weight approach to encode binarized images that preserves similarity of patterns at nearby locations while relying only on native HD arithmetic vector operations, and not making use of external methods for feature extraction, is introduced.The approach uses point of interest selection to derive local features of the image and local linear mapping to encode the location of these local features in the image.After selecting the best settings for the four introduced hyperparameters with 10-fold cross validation, an accuracy of 97.35% is reached on the test set for the MNIST data set and 84.12% for the Fashion-MNIST data set.These results outperform other studies using baseline HDC with different encoding approaches and are on par with more complex hybrid HDC models.The proposed encoding approach also shows a higher robustness to noise and blur compared to the baseline encoding.

Appendix: List of symbols
A summary of notation can be found in Table A1.

Figure 1 :
Figure 1: Schematic overview of the HDC framework in which two main building blocks can be distinguished: an encoder and a classifier.

Figure 2 :
Figure 2: Schematic overview of the record-based encoding [12, 29, 43] with an IM for the feature IDs and a CIM for the feature values.As the second main building block, the classifier has two modes of operation: (1) during training, the sample HVs and associated class labels are used to produce class prototypes; and (2) during inference, a sample HV is compared with each of the class prototypes and predicts the corresponding class label by selecting the class with highest similarity.Different variants of training methods exist, as reported in our previous work[50].
illustrates the application of linear mapping for a feature with discrete values ranging from -100 to 100 with steps of 10 and thus 21 levels.It shows the similarity of values to the lowest level (feature value = −100) that decreases linearly up until orthogonality (similarity = 0.5) and the similarity of values to the feature value equal to −30 that decreases linearly for smaller and larger feature values.

Figure 3 :
Figure 3: Example of linear mapping [23, 48] for a feature with discrete values ranging from -100 to 100 with steps of 10 and thus 21 levels.The similarity of each feature value's level hypervector to the lowest level hypervector (feature value = −100) and to the hypervector for the feature value of −30 is shown.

Figure 4 :
Figure 4: Example of local linear mapping with 4 splits for a feature with discrete values ranging from -100 to 100 with steps of 10 and thus 21 levels and 21−1 4 + 1 = 6 levels in one split.The similarity of each feature value's level hypervector to each edge hypervector (feature value = -100, -50, 0, 50 and 100) is shown.

4. 1 . 1
Native HDC Assume an image I of size w × h is given as an input which is binarized, denoted here as I bin .The binarized image is either flattened into an array p of length w * h where p[x] is the value of the pixel in the array p at position x, or used in its original (a) Orthogonal mapping.(b) Linear mapping [23, 48].(c) Concatenation [39, 45].(d) Local linear mapping.

Figure 5 :
Figure5: Similarity of all pixel's position vector to position vector of pixel at location(21,11) for an image of size 28x28 that are encoded as v x ⊗ v y of which the x and y positions are mapped to vectors with (a) orthogonal mapping, (b) linear mapping[23,48], (c) the concatenation approach of Rachkovskij et al.[45] and Neubert et al.[39] using 10 edge vectors for each axis (dotted lines) and (d) our proposed approach of local linear mapping with 9 splits and thus 10 edge vectors (dotted lines).

Figure 6 :
Figure 6: Schematic overview of the proposed unified encoding framework for a training sample of the MNIST data set with size 28x28 using a patch size of 3x3 around the POIs (z = 3, h = 28 and w = 28).

H
vector HD space, {0, 1} D B bundle HD space, N D ⊕ bundling operator [.] majority rule ⊗ binding operator ρ permutation operator I input image T bin binarization threshold I bin binarized image I p flattened image I bin w width of image I h height of image I P patch of I bin z patch size I bin [x, y] value of pixel at position (x, y) in I bin p[x] value of pixel at position x in p P [x, y] value of pixel at position (x, y) in patch P S number of splits in local linear mapping ρ i permutation applied i times M type of mapping in patch O orthogonal mapping in patch L linear mapping in patch ρ X unique permutation for x-axis in I ρ Y unique permutation for y-axis in I v I bin [x,y] HV representing pixel value I bin [x, y] v p[x] HV representing pixel value p[x] v x HV representing position x in p or I v y HV representing position y in I x unique random HV for x-axis in I y unique random HV for y-axis in I v output HV of (last) layer of a neural network n number of features v i HV representing the ith feature v f [i] HV representing the value of the ith feature P set of (x,y) positions of POIs IM IM storing binary pixel values CIM x,z CIM storing position vectors for x-axis CIM y,z CIM storing position vectors for y-axis in patch P with size z in patch P with size z CIM x,w CIM storing position vectors for x-axis CIM y,h CIM storing position vectors for y-axis in image I with width w in image I with height h [20,30,36,44]lly mapped position vectors (a) Permutation When considering the flattened image, a unique random HV is assigned to each pixel position in the array p after which the obtained position HV v x is shifted with one position if the corresponding pixel value p[x] is one and not shifted if it is zero[10,21,24,35].To encode the 2D binarized image, two unique permutations ρ X and ρ Y are assigned to represent the x-and y-axis of the image, respectively.These permutations are applied x and y times, respectively, to the pixel value HV v I bin [x,y][20,30,36,44].

Table 2 :
Accuracy (%) on the training and validation set and the number of iterations needed to reach the best training accuracy, averaged over the ten folds of 10-fold cross validation for the MNIST and Fashion-MNIST data set and for the different settings of the number of splits S used in local linear mapping.Data are mean (± standard deviation) and in bold is the best validation accuracy for each data set.

Table 3 :
Accuracy (%) on the training and validation set and the number of iterations needed to reach the best training accuracy, averaged over the ten folds of 10-fold cross validation for the MNIST and Fashion-MNIST data set and for the different settings of the number of splits S used in local linear mapping and of the patch size z used in POI encoding.Data are mean (± standard deviation) and in bold is the best validation accuracy for each data set.

Table 4 :
Accuracy (%) on the full training and unseen test set and the number of iterations needed to reach the best training accuracy, averaged over ten independent runs for the MNIST (S = 9 and z = 7) and Fashion-MNIST (S = 9 and z = 5) data set.Data are mean (± standard deviation).