Abstract
(English Only) We present Multi-task Learning with Localization Ambiguity Suppression for Occupancy Prediction (MiLO) as our solution for camera-based 3D Occupancy Prediction Challenge at CVPR 2023. The proposed MiLO is unique in two important aspects: (1) varying-depth multi-task learning to incorporate perspective semantic prediction, depth estimation, and occupancy prediction for more robust representations; and (2) localization ambiguity suppression to adaptively suppress low-confident localization in camera-based system with respect to object class and distance. In addition, our method employs several techniques to boost the performance. Our final model achieves 52.45 points mIoU without using external data and wins 2nd place in CVPR 2023 3D Occupancy Prediction Challenge.