^{1}

^{2}

^{2}

^{3}

^{3}

^{*}

^{1}

^{2}

^{1}

^{2}

^{3}

^{1}

^{2}

^{3}

Edited by: Hang Su, Fondazione Politecnico di Milano, Italy

Reviewed by: Jing Luo, Wuhan Institute of Technology, China; Jiahao Chen, Institute of Automation (CAS), China; Chao Cheng, Jilin University, China

This article was submitted to Original Research Article, a section of the journal Frontiers in Neurorobotics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Planar motion constraint occurs in visual odometry (VO) and SLAM for Automated Guided Vehicles (AGVs) or mobile robots in general. Conventionally, two-point solvers can be nested to RANdom SAmple Consensus to reject outliers in real data, but the performance descends when the ratio of outliers goes high. This study proposes a globally-optimal Branch-and-Bound (BnB) solver for relative pose estimation under general planar motion, which aims to figure out the globally-optimal solution even under a quite noisy environment. Through reasonable modification of the motion equation, we decouple the relative pose into relative rotation and translation so that a simplified bounding strategy can be applied. It enhances the efficiency of the BnB technique. Experimental results support the global optimality and demonstrate that the proposed method performs more robustly than existing approaches. In addition, the proposed algorithm outperforms state-of-art methods in global optimality under the varying level of outliers.

Last decades witness the rapid development of frame to frame relative pose estimation in the field of computer vision, especially in visual odometry (VO), SLAM (Mur-Artal et al.,

In visual geometry, all degree-of-freedom (DoF) relative pose problems between consecutive frames can be dealt with from 2D-2D point correspondences. Basically, eight points are sufficient to recover relative pose in 5-DoF (Hartley and Zisserman,

Common solutions to the relative pose estimation problem are conducted based on accurate point correspondences (Nistér,

In this study, we propose a novel Branch-and-Bound (BnB) method to obtain globally-optimal inlier maximization for relative pose estimation under planar motion. To verify the feasibility and validity of the proposed method, we set several experiments on synthetic and real data. Different types of noise and varying levels of outliers are taken into consideration. Besides, performances on two real datasets KITTI (Geiger et al.,

We propose a globally-optimal BnB algorithm for the relative pose problem under planar motion constraint, where the algorithm is suitable for mobile robots or AGVs.

Owing to the special modification of motion equations, the relative pose can be decoupled into planar rotation and translation, enhancing the efficiency of the BnB technique greatly.

Our experimental results show that the proposed method keeps better robustness under both image noise and outliers.

The rest of this study is organized as follows. Related study is reviewed in Section 2. Brief notations and the main algorithm are given in Section 3. In Section 4, comprehensive experiments on synthetic and real data are conducted to evaluate the performance of our BnB approach. Finally, we conclude our study in Section 5.

Epipolar geometry is utilized to deal with the 5-DoF relative pose problem in multi-view geometry. It introduces the essential matrix to describe the relationship between different views and projected points. Basically, 8 points are sufficient to deal with the 5-DoF relative pose problem (Hartley and Zisserman,

Recently, some solvers (Raposo and Barreto,

In addition to restricting the DoF of camera motions, the minimal feature matchings of the relative pose problem will descend as well when utilizing the additional sensors. Stereo sensors capture 2 images once and the disparity map can be computed to recover the depth information, which benefits to settling scale problem of relative translation. Besides, RGB-D sensors provide depth information directly. In terms of high DoF of camera motions, the methods in Xu et al. (

This section first illustrates epipolar geometry under planar motion constraint and then describes the proposed BnB method to search optimal parameters for the maximization of energy function in detail.

Epipolar geometry holds the ability to outline the inherent geometric relationship between two views, becoming the common tool to deal with relative pose problems. Algebraically, the 3 × 3 essential (or fundamental) matrix composed of relative rotation and translation is introduced to express the relationship with projected points. Given that a 3D point is projected on two normalized image planes, relative equations can be obtained exploiting epipolar geometry.

where _{×}

Intuitively, _{y} under the view of Location 1 to 2 can be written as:

the translation matrix

then combining Equation 1 and _{×}

Planar motion from Location 1 to 2 in top-view. The relative pose can be described by θ and ϕ, where θ represents the yaw of the vehicle and ϕ represents the direction of translation. ρ denotes distance between two locations.

Let us observe the form of Equation 4. Drawing support from the auxiliary angle formula, we can rewrite the equation as:

where θ_{1} = θ − ϕ, ϕ_{1} = arctan(_{2}), θ_{2} = ϕ, ϕ_{2} = −arctan(_{1}), _{1} and _{2} is based on the assumption that _{1} and _{2} are non-negative. For negative _{1} and _{2}, we just need to additionally discuss _{1} and _{2} are non-negative.

Next, given _{1}, θ_{2}) as:

where

Our goal is to maximize function _{1}, θ_{2}. However, the objective is non-smooth and non-concave, which means obtaining its optimal solution is not easy.

To obtain the optimal solution, we design a BnB algorithm, a globally-optimal solver based on search and iteration. By selecting branches of sub-problems with a higher priority which is estimated by well-designed bound strategies, BnB searches for globally optimal solutions efficiently. _{1} ∈ _{1}, θ_{2} ∈ _{2}, and _{1}, _{2} range from −π to π, respectively. For the branch step, we directly divide _{1} and _{2} into 2 equal parts uniformly. For the bound step, we first rewrite our objective function as:

The lower bound and upper bounds are considered separately. It is evident that randomly selected θ_{1} and θ_{2} from _{1} and _{2} can comprise a lower bound _{1}, _{2}). Our objective function is to maximize _{1}, _{2}), given θ_{1} ∈ _{1} and θ_{2} ∈ _{2}, we hope that

To express more clearly, we denote

that equals _{i} can be expressed as:

then it is not hard to relax the indicator function

Thus, the upper bound can be obtained as:

Note that the right side of the Equation 14 has no relation with θ_{1} and θ_{2}, so the max operator can be aborted. Therefore, the remaining is to compute _{1} ∈ _{1} and θ_{2} ∈ _{2}, we just need to compute two minimum and maximum trigonometric functions separately.

According to different range of _{1}, _{2}, and

It is worth noting that when _{1} and _{2} collapse to a single point, respectively, the upper bound and lower bound tend to be the same, ensuring the convergence of the proposed BnB method.

BnB for relative pose estimation from consecutive frames.

In this section, we conduct experiments on synthetic and real data to evaluate the effectiveness and robustness of the proposed BnB method. To reject outliers, algorithms under comparison are combined with RANSAC. The parameters of RANSAC keep constant in the same experiment. All our experiments are executed on the Intel Core i7-9750H CPU. Our proposed BnB method is compiled and executed with C++ on Windows, The compared methods are written on Matlab R2020a, which may hold a slight difference from the original articles. Noting the randomness that existed in RANSAC, the estimated poses will not be fully consistent but quite close.

We evaluate the effectiveness and robustness of our BnB method with synthetic data, respectively. The variances of the experiments are image noise and non-planar noise. Additionally, to evaluate the robustness and global optimality, we take an experiment under different ratios of outliers into consideration. Four different algorithms [1AC (Hartley and Zisserman,

To generate 3D points in space, we create 50 different virtual planes randomly and sample points distributed in the range of −5–5 m (_{gt}, ϕ_{gt}), we randomly choose them from

We replace the epipolar constraint Equation 4 with an inequality

The inequality is exploited as a criterion for judging whether a pair of feature matching belongs to the set of the inliers. In all synthetic experiments, ε is fixed to 10^{−3}. Besides, the number of iterations of the RANSAC scheme is decided by:

where

For experiments with the image noise as the variance, we set image noise with different Gaussian distributions ^{2}) with the SD σ ranging from 0 to 1. Under each σ, the median rotation and translation of 200 repetitions are utilized for evaluation.

Evaluations of five algorithms on different image noise. The non-planar noise is not added. The left image shows rotation error with different image noise and the right one represents translation error with different image noise. 1AC, 2pt, 5pt, and 8pt are the studies of Hartley and Zisserman (

Additionally, we add non-planar noise in rotation and translation to simulate more realistic road conditions. Following (Choi and Kim,

Evaluations of five algorithms on different non-planar noise. The image noise is set as σ = 0.5. The left image shows rotation error with different non-planar noise and the right one represents translation error with different non-planar noise. 1AC, 2pt, 5pt, and 8pt are the studies of Hartley and Zisserman (

Apart from the image noise and non-planar noise, there exist many mismatches during feature matching, e.g., ASIFT and VLFeat. Since our BnB method aims to obtain a globally-optimal inlier maximization solution of the relative pose, we consider a common metric

The boxplot of

We evaluate the effectiveness and robustness of our BnB method mainly on the KITTI odometry dataset (Geiger et al., _{R} and ε_{t}. Besides,

The proposed BnB method is compared with 2 different algorithms [1AC (Choi and Kim, ^{−3} since real data undergoes higher non-planar noise and mismatches. The number of iterations in RANSAC is fixed to 100 through experiments. For evaluating rotation and translation error, we take the median value on each sequence to avoid the influence of failures by RANSAC. The mean value of

Comparison of three methods on 11 sequences of KITTI odometry dataset.

_{R} |
_{t} |
||||||||
---|---|---|---|---|---|---|---|---|---|

00 | 0.0337 | 0.1956 | 0.8346 | 3.7567 | 39.3507 | 39.5037 | |||

01 | 0.0123 | 0.1880 | 0.2596 | 2.7532 | 44.4045 | 45.0336 | |||

02 | 0.0100 | 0.1510 | 0.5691 | 2.2629 | 40.9652 | 41.3803 | |||

03 | 0.0237 | 0.1433 | 1.3076 | 1.2426 | 41.2638 | 41.5613 | |||

04 | 0.1231 | 0.1270 | 1.0309 | 1.7233 | 45.0222 | 45.9269 | |||

05 | 0.0053 | 0.1514 | 0.1297 | 3.4481 | 41.6315 | 41.7725 | |||

06 | 0.0427 | 0.1611 | 0.6721 | 2.5658 | 42.2473 | 42.7400 | |||

07 | 0.0033 | 0.1285 | 0.3313 | 4.2962 | 42.1764 | 42.3755 | |||

08 | 0.0100 | 0.1374 | 0.0608 | 3.0336 | 41.5486 | 41.7359 | |||

09 | 0.0152 | 0.1366 | 0.5864 | 2.5314 | 40.8648 | 41.4925 | |||

10 | 0.0076 | 0.1391 | 0.5296 | 3.0361 | 40.8967 | 41.2442 |

_{R}, ε_{t}, and inlier_max symbolize the rotation error, translation error, and average maximum matching point numbers. The bold values indicate the lowest error

Visualization of night scenes using proposed BnB method on noisy matchings with sequences of KITTI odometry dataset. The green lines represent correct correspondences and the red lines denote mismatches. In each pair of scenes, the scene below moves from above. Better viewed in color.

To give a comprehensive depiction of the performance of solvers above, we exhibit the relationship between the rotational and translation error defined in KITTI VO and SLAM Evaluation and the path length in ^{−4} to show the performance more clearly. As shown in

Rotation and translation error of three algorithms. They are exhibited with respect to path length. 1AC, 2pt are the studies of Choi and Kim (

Besides, we randomly pick five scenes from the Malaga dataset in five different sequences to help evaluate the global optimality of the proposed method under noisy cases and ^{−3} and the number of RANSAC schemes is fixed to 1,000 to decrease the randomness.

Five pairs of consecutive frames selected from the Malaga dataset randomly.

Straight path | 14 | 15 | |

Through road | 16 | 18 | |

Roundabout | 12 | 11 | |

Roundabout with traffic | 16 | 17 | |

Loop closure | 19 | 19 |

Inliers and outliers in five scenes of the Malaga dataset. The green lines represent correct correspondences and the red lines denote mismatches. In each pair of scenes, the scene below moves from above. Better viewed in color.

In the end, due to the globally-optimal searching strategy of the proposed BnB method, the BnB method is more time-consuming than other non-minimal or minimal solvers. For 50 point correspondences from each pair of consecutive images and under the tolerance ε of epipolar constraint 10^{−4}, BnB consumes 18.3203 s. While ε decreases to 10^{−3}, consumed time decreases to 4.1867 s, and it also losses some precision.

Recent studies on relative pose estimation are targeted at more robust and faster methods, which will improve the performance of AGVs and robots. To enhance the robustness, we propose a novel globally-optimal BnB method for relative pose estimation of a camera under planar motion. Based on this reasonable assumption of planar motion for cameras fixed on self-driving cars or on-ground robots, our BnB method takes feature correspondences in the normalized camera coordinate system as input and obtains the globally-optimal solution for relative pose between consecutive frames effectively. Results of synthetic experiments show that our proposed BnB method has a highly effective performance of inlier maximization even on the high level of outliers. Additional experiments on the KITTI dataset and Malaga dataset both further confirm our BnB method is more robust than existing approaches. However, due to the globally-optimal searching strategy of the proposed BnB method, the proposed method is more time-consuming. For future study, we expect to find a tighter bound to speed up the convergence.

Publicly available datasets were analyzed in this study. This data can be found at:

YL is responsible for ensuring that the descriptions are accurate and agreed by all authors and provided the original idea. The conceptualization and methodology were developed by ZL and HL. GC and AK: supervision and validation. RZ is responsible for software and visualization. All authors contributed to the article and approved the submitted version.

This study was financially supported by State Key Laboratory of Vehicle NVH and Safety Technology 2020 Open Fund Grant, Project NVHSKL-202009, the GermanResearch Foundation (DFG), and the Technical University of Munich (TUM) in the framework of the Open Access Publishing Program.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.