An Improved Sum of Squared Difference Algorithm for Automated Distance Measurement

In recent years, with improvement of photoelectric conversion efficiency and accuracy, photoelectric sensor was arranged to simulate binocular stereo vision for 3D measurement, and it has become an important distance measurement method. In this paper, an improved sum of squared difference (SSD) algorithm which can use binocular cameras to measure distance of vehicle ahead was proposed. Firstly, consistency matching calibration was performed when images were acquired. Then, Gaussian blur was used to smooth the image, and grayscale transformation was performed. Next, the Sobel operator was used to detect the edge of images. Finally, the improved SSD was used for stereo matching and disparity calculation, and the distance value could be obtained corresponding to each point. Experimental results showed that the improved SSD algorithm had an accuracy rate of 95.06% when stereo matching and disparity calculation were performed. This algorithm fully meets the requirements of distance measurement.


INTRODUCTION
A photoelectric sensor is a semiconductor device which can convert light signals into electrical signals based on photoelectric effect. Photoelectric effect means that the energy of photons was absorbed by electrons of material when light is irradiated on materials, and then corresponding electric effect occurs. Photoelectric sensors were widely used in detection field because it has advantages of short response time, long detection distance and high resolution [1].
In recent years, three-dimensional measurement has been carried out by arranging photoelectric sensors to simulate binocular stereo vision. This method has the advantages of long detection distance, recognizable color, and wide application range. It has become an important distance measurement method. Binocular stereo vision is also an application of machine vision. It uses left and right cameras to imitate human left and right eyes. Based on the parallax principle, the imaging device is used to obtain two images of the measured object from different positions, and the threedimensional geometric information of the object is obtained by calculating the position deviation between corresponding points of an image [2]. Binocular camera sensor ranging method has gradually become the most common stereo ranging method [3,4]. Therefore, in order to match different parts at the same time, a real-time binocular stereo vision system which used Field Programmable Gate Array (FPGA) for full line design was proposed to improve the processing speed [5]. This algorithm took advantage of parallel computing and fast operation speed in FPGA, but the versatility was poor and targeted program development was needed. Then, high and low texture scenes were processed separately, a time of flight (TOF) depth camera was used to measure the low texture scene, and a binocular camera was used to measure the high-resolution scene, so as to improve the accuracy of three-dimensional measurement [6]. It needed to equip with TOF sensors, and the hardware design was complicated. In addition, in two sets of binocular vision systems with different accuracy levels, the method of triangulation analysis and spatial plane fitting were proposed to calculate relative poses [7]. This algorithm had a higher accuracy than other methods, but it required four cameras, which had a high cost. And algorithms that recognize objects first was proposed. For example, Canny edge detection was used to extract the target contour in order to determine the threedimensional coordinates of the target [8]. Likewise, precise position of the target was used to determine robustness of the ellipse fitting strategy after located its approximate area [9]. And Canny edge detection based on binocular vision was used to identify and locate the target [10]. But these algorithms cannot be applied to other occasions. Then, roll angles of the binocular measurement system were calculated to compensate for dynamic measurement errors by installing a tilt angle sensor horizontally [11]. This algorithm required additional sensors to assist in attitude detection, which could increase the complexity of the system. And then a method based on the parallel binocular vision system and similarity judgment function was established by using cluster analysis method, and the distance was calculated by combining features of gradient histogram and cascade classifier [12]. This method had high a computational complexity and a slow operation speed. And combined with polar line matching, the Multi-line centerline detection stereo matching method was proposed for distance estimation [13]. This algorithm required multiple mathematical model conversions, which was not conducive to the real-time performance of the system. Therefore, we proposed the improved SSD algorithm for automated distance measurement. It could improve the accuracy of automated distance measurement. The algorithm mainly includes five algorithms: gray-scale transformation, Gaussian Blur transformation, edge detection by Sobel operator, model of a binocular stereo camera, improved Sum of Squared Difference. It can accurately perform stereo matching and distance measurement.

Gray-Scale Transformation
In order to reduce the amount of calculation and improve the real-time performance, the image data is grayscale transformed. Grayscale transformation is a combination of three channel gray value calculation. According to the importance of three primary colors, three components of RGB are weighted and averaged with different weights, Eq. 1 can obtain a grayscale image [14], where i and j represent coordinates of horizontal and vertical of images, R (i, j), G (i, j) and B (i, j) respectively represent components of a points in row i and column j of three primary colors.

Gaussian Blur Transformation
Before edge detection, the image is filtered and denoised. It can reduce the interference of the original noise on edge detection. Therefore, Gaussian blur transformation is used to reduce the level of image detail and noise interference, so that images can become smoother and easier to perform stereo matching [15,16]. Gaussian Blur transformation is defined as Eq. 2, where u is abscissa of pixel, v is ordinate of pixel, σ is blur radius, and it is standard deviation of normal distribution.

Edge Detection by Sobel Operator
The surface of some objects is smooth, whose features are not obvious. And stereo matching may be affected seriously. In order to obtain a high-level feature and improve the accuracy of stereo matching, the first derivative Sobel operator can be used to perform gradient extraction operation of spatial convolution. And edge can be extracted from the image to make the feature more obvious [17,18], the Sobel operator is shown in Eq. 3, where I is original image, G x and G y are Convolution factor of longitudinal and horizontal axis direction.

Model of the Binocular Stereo Camera
The depth of an object refers to the distance between the camera and object. It is measured for distance by ultrasonic, laser, etc. But these methods can only measure a specific point. We measured distance by arranging left and right cameras to simulate parallax of the image, which can be obtained by human binocular. Binocular parallax is a position difference of imaging pixel coordinates in left and right cameras. Therefore, stereo matching can be performed on images which are obtained by the left and right cameras. Cost calculation can obtain three-dimensional information of each point in an image in the real scene [19,20]. This algorithm uses the principle of similar triangles to construct a parallax distance calculation model. The model is shown in Figure 1.
Among them, C l and C r are left and right camera sensors respectively, and O l and O r are corresponding focal points. P is a point to be measured, P l and P r are imaging points on a camera lens, and x l and x r are corresponding offset pixel point distances respectively. B is a baseline of left and right cameras, that is, the distance between C l and C r . Triangle PP l O and C l O l P l are similar triangles, so are PP r O and C r O r P r . And triangles PP l P r and PC l C r are also similar triangles, so Eq. 4 can be obtained, where Z is the distance from the image sensor to the target object, which is a distance value from point P to baseline B, f is the focal length of the camera, x l is the abscissa of the image on left, and x r is the ordinate of the image on left.
The available distance between the image sensor to the target object is shown in Eq. 5.
Z Bf x l − x r (5)

Improved Sum of Squared Difference
Stereo matching cost calculation can be used to compare similarity between a certain area of an image and a certain area of another image. Common stereo matching algorithms include Sum of Absolute Difference (SAD), Sum of Squared Difference (SSD), Normalization Cross-Correlation (NCC), Census, etc [21][22][23]. Among these algorithms, SSD has lowest implementation complexity, but the matching accuracy is relatively poor. So we optimized the basis of SSD and proposed an improved SSD, which can improve matching accuracy and maintain original lightweight calculation at the same time.
The improved Sum of Squared Difference is shown in Eq. 6, where I l is left image, I r is the right image, c is the center area of target window, r is edge area of target window, and d is target window in coordinate difference between left image and right image, x is left horizontal axis, y is coordinate vertical axis, and a is weight coefficient.

EXPERIMENT AND DISCUSSION
The platform was equipped with a left camera and a right camera. Before acquiring images, it was necessary to match parameters of the binocular camera, and obtain the correction matrix parameters between left and right cameras to keep them consistent. During the measurement, two frames of images were obtained by each camera at the same time, and parameters of images were calibrated. Then Gaussian blur was used to smooth image, gray scale transformation was performed and Sobel operator was used for edge detection. Finally, improved SSD was used to perform stereo matching and disparity calculation on a target image, and then the distance value could be obtained corresponding to each point, which was combined into a distance value matrix. The specific processing flowchart of system is shown in Figure 2.

System Hardware Architecture
The experimental platform is shown in Figure 3. Embedded computer Raspberry Pi 4B was used as the main processor to build an experimental platform. An ARM Cortex-A72 quad-core processor with a main frequency of 1.5 GHz was used with Raspberry Pi 4B. It was equipped with 2 GB of LPDDR4 memory, it also had two USB 3.0 ports and two USB 2.0 ports, two external micro HDMI ports and a gigabit Ethernet port. It can be directly powered by a USB Type-C port with 5 V. The operating system was UBUNTU18.04, PYTHON3.7, and OPENCV2.0 environments were deployed. The image sensor was composed of two left and right cameras placed in parallel, and the space was 10 cm between the left and right cameras, which was called a center baseline. The camera was connected to the laptop by USB ports. The camera used a low-sensitivity Sony high-speed CMOS camera with a resolution of 1,920*1,080 pixels and it could obtain 60 frames of images per second. In order to reduce the difficulty of stereo matching, two cameras were guaranteed on the same horizontal line, and the inclination of the horizontal plane was set to be consistent.

Image Preprocessing
Image preprocessing mainly included image correction, Gaussian smoothing, and grayscale transformation. Precalibrated differential parameters of the left and right camera were applied for image correction, so that initial coordinates of the left and right image were consistent. The transformation effect is shown in Figure 4. We could draw a red straight line on the image. It could be seen from the part in the yellow box, the left car light before calibration was completely above the red line in the left image, but there was still a small part below the red line in the right image. Moreover, the left car light was relatively close to the red line in the left and right images after calibration. And pixels with the same feature were almost in the same row on the left and right images. That was a prerequisite for stereo matching to improve accuracy, which was conducive to subsequent accurate feature matching. Then, Gaussian smoothing was performed for the image. All noise must be filtered on the image so that the image level was excessively relaxed, and the influence of noise on subsequent stereo matching was reduced. The license plate area was used as an example, the transformation effect is shown in Figure 5. After Gaussian smooth transformation, the gradient of the image was smoother, the sense of hierarchy was reduced, and the feature difference between the left and right cameras, which was caused by different viewing angles, was reduced. Then, it was easier to perform stereo matching and improve matching accuracy.
Finally, the grayscale transformation was performed to reduce the amount of calculation for stereo matching. By mixing RGB channels in proportions, two-dimensional matrix data was generated, the original features of the image were retained as much as possible when the amount of data was reduced.

Edge Extraction
Some objects in an image had a smooth surface. For example, there is a slight reflection on the body surface. Due to the reflection, the image appears to lack texture features. Unobvious features would lead to inaccurate matching results and low stereo matching recognition rate. In order to highlight the details of the smooth part, edge extraction was performed on the image. Gradient operations were performed on the vertical and horizontal directions of the image, which could magnify the small edge changes of inconspicuous texture surface, and the advanced features of smooth part were highlighted on the image. The transformation effect is shown in Figure 6. It could be seen that the processed image had obvious features in the areas near the license plate and the lower part of a car body. The original smooth surface had gradient edge features, so the accuracy of stereo matching could be improved with their inconspicuous textures, and the stereo matching recognition rate could be also improved.

Stereoscopic Matching of Left and Right Photos
Stereo matching was the most critical step of this algorithm. The same feature areas of the left and right images were matched. It meant that the feature recognition of the left image was performed on the same horizontal line in the right image. After calculation, it could find out the area with the highest similarity to the left image. According to the horizontal coordinate migration of the same feature, the depth value of the feature area could be calculated. The migration was the distance value from the baseline to objects. The recognition accuracy of stereo matching determined the ranging accuracy. Compared with other stereo matching algorithms, the gray value of the pixel in   Frontiers in Physics | www.frontiersin.org August 2021 | Volume 9 | Article 737336 5 matching results. In order to verify the accuracy and performance of our algorithm, an experimental platform was used to control the binocular camera at a specific distance from the target, and image acquisition was performed. Then, Du Jiang et al. proposed the binocular matching (BM) algorithm which run at the same time to record the data [24]. Measurement results were shown in Table 1. It can be seen that the actual measurement accuracy of the improved SSD algorithm was 95.06% on average. It was 1.44% higher than the comparison algorithm. So, it had a higher accuracy and smaller accuracy error fluctuations, which meets actual measurement requirements.

CONCLUSION
In this paper, an algorithm, which could configure the left and right cameras, to obtain images from different angles of the target was proposed. The images were smoothed to reduce noise, and grayscale transformation was performed to reduce stereo matching operations. Then, corresponding area pixels were matched on the same horizontal line of the right image. The absolute value of the degree was made difference and then summed, and the central key area was weighted, and the matching of the central area played a more important role in entire matching results. The algorithm improved the matching accuracy and could obtain measurement results more accurately, which fully met the requirements of distance measurement.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
YW and YL designed this project. YL carried out most of the experiments and data analysis. YG performed part of the experiments and helped with discussions during manuscript preparation. YW and YL contributed to the data analysis and correction and provided helpful discussions on the experimental results. All authors have read and agreed to the published version of the manuscript. The first line was the actual distance between binocular camera and target, the second line was measurement distance by using improved SSD algorithm, the third line was measurement distance of the comparison algorithm, the unit was meter. The fourth line was the actual accuracy difference between our algorithm and the comparison algorithm, the unit was a percentage.