Visual Simultaneous Localization and Mapping (SLAM) is a critical technology in robotics and computer vision that allows real-time state estimation for various applications. SLAM has become important for monocular depth estimation, view synthesis, and 3D human pose reconstruction tasks. However, these tasks face a critical challenge in applications in achieving high tracking accuracy with monocular video and no inertial measurements. Moreover, the SLAM algorithms based on deep networks often need significant computational power, making them less suitable for online applications. Existing solutions demand high-end GPUs with large memory capacities, limiting their practical use in real-time scenarios.
Existing works have tried different approaches to address SLAM challenges. Some researchers have created deep-learning systems trained on synthetic data like TartanVO, DROID-SLAM, and DPVO to enhance generalization. These methods show promise in generalizing across different environments without any extra fine-tuning. However, many approaches focus mostly on accuracy within a specific area, making them inefficient in many applications. Recently, new SLAM techniques using Gaussian-splatting and NeRFs have been developed, but they mainly focus on reconstructing high-quality instead of reliable tracking. Moreover, loop closure techniques are used to fix drift issues, with mid-term and long-term data-association plans being common in many SLAM systems.
Researchers from Princeton University have proposed DPV-SLAM, an extension of the DPVO odometry system that addresses the limitations of existing deep SLAM approaches. This method introduces a new mechanism for loop closure that avoids common performance issues related to SLAM backends based on deep networks. Moreover, DPV-SLAM utilizes a traditional loop closure mechanism based on classical features, which works alongside the deep-SLAM backend. DPV-SLAM demonstrates outstanding performance across various datasets. These datasets include EuRoC, KITTI, TUM-RGBD, and TartanAir, showcasing enhancements in accuracy, speed, and robustness compared to existing methods.
DPV-SLAM introduces two efficient mechanisms to correct drift: proximity loop closure and classical loop closure. The proximity loop closure detects loops based on camera proximity and addresses the challenge of running backend and frontend processes in parallel on deep networks. It enhances a single, shared scene graph that merges odometry with low-cost loop closure factors. The researchers created a CUDA-accelerated block-sparse implementation of bundle adjustment that works with DPVO’s “patch graph” scene representation. This process makes global optimization efficient. This proximity-based loop closure is much faster than DROID-SLAM’s backend on the EuRoC dataset. The classical loop closure uses image retrieval and pose graph optimization to correct scale drift, operating on the CPU.
The results obtained for DPV-SLAM across various datasets show impressive performance. It achieves comparable results to other deep SLAM systems while outperforming classical approaches on the TUM-RGBD dataset. DPV-SLAM achieves the second-lowest average error among all reported methods, running at 39 FPS and effectively addressing scale drift challenges on the KITTI dataset. It performs similarly to DROID-SLAM but runs 2.5 times faster using only a quarter of the memory on EuRoC-MAV. Moreover, it achieves a 4 times lower error with minimal speed reduction and memory increase compared to the base DPVO system. These results prove the versatility and efficiency of DPV-SLAM across various domains.
In conclusion, researchers from Princeton University have proposed DPV-SLAM, an extension of the DPVO odometry system that addresses the limitations of existing deep SLAM approaches. It performs well across different environments using efficient computational resources and frame rates. It is evaluated on datasets like EuRoC, TartanAir, TUM-RGBD, and KITTI, where it outperforms traditional. Although it needs a GPU and only offers sparse 3D reconstruction, its overall performance and efficiency make it valuable for the computer vision field. However, a limitation of DPV-SLAM is the global bundle adjustment layer’s quadratic scaling with pose variables, but it is managed by limiting the range to 1000 frames.
Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 48k+ ML SubReddit
Find Upcoming AI Webinars here
Sajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.