Self-Supervised Surround-View Depth Estimation with Volumetric Feature Fusion

Publication

2022.11.08

36th annual Conference on Neural Information Processing Systems (NeurIPS 2022)에 실린 김중희, 허준화, Tien Nguyen, 정성균 저자의 “Self-supervised surround-view depth estimation with volumetric feature fusion” 논문을 소개합니다. NeurIPS는 세계적인 머신러닝(ML)·AI 학회로 가장 최신의 머신러닝 리서치 기술에 대해 학계와 산업계를 포함한 다양한 연구자들이 참여하여 소개하고 있습니다.

36th annual Conference on Neural Information Processing Systems (NeurIPS 2022)
NeurIPS is one of the biggest machine learning and artificial intelligence conference, where researchers from both academia and industry gather to share the latest machine learning research.
The paper “Self-supervised surround-view depth estimation with volumetric feature fusion” written by Jung-Hee Kim, Junwha Hur, Tien Nguyen, Seong-Gyun Jeong, has been accepted to the NeurIPS 2022.
Click the link below for details.

Title: Self-supervised surround-view depth estimation with volumetric feature fusion
Authors: Jung-Hee Kim, Junhwa Hur, Tien Nguyen, Seong-Gyun Jeong
Abstract: We present a self-supervised depth estimation approach using a unified volumetric feature fusion for surround-view images. Given a set of surround-view images, our method constructs a volumetric feature map by extracting image feature maps from surround-view images and fuse the feature maps into a shared, unified 3D voxel space. The volumetric feature map then can be used for estimating a depth map at each surround view by projecting it into an image coordinate. A volumetric feature contains 3D information at its local voxel coordinate; thus our method can also synthesize a depth map at arbitrary rotated viewpoints by projecting the volumetric feature map into the target viewpoints. Furthermore, assuming static camera extrinsics in the multi-camera system, we propose to estimate a canonical camera motion from the volumetric feature map. Our method leverages 3D spatio- temporal context to learn metric-scale depth and the canonical camera motion in a self-supervised manner. Our method outperforms the prior arts on DDAD and nuScenes datasets, especially estimating more accurate metric-scale depth and consistent depth between neighboring views.