Visual Odometry (VO), recovers the ego-motion from image sequences by exploiting the consistency between consecutive frames, which has been widely applied to various applications, ranging from autonomous driving and space exploration to virtual and augmented reality. Although many state-of-the-art learning-based methods have yielded competitive results against classic algorithms, they consider the visual cues in the whole image equally.
The proposed DeepAVO distinguishes and selects extracted features from two aspects: 1) there are four branches extract geometric information from corresponding quadrant of optical flow; 2) each branch in the DeepAVO contains two CBAM blocks enabling the model to concentrate on pixels in distinct motion. The contributions can be summarized into: