Abstract
We present Multi-task Learning with Localization Ambiguity Suppression for Occupancy Prediction (MiLO) as our
solution for camera-based 3D Occupancy Prediction Challenge at CVPR 2023. The proposed MiLO is unique in two
important aspects: (1) varying-depth multi-task learning to
incorporate perspective semantic prediction, depth estimation, and occupancy prediction for more robust representations; and (2) localization ambiguity suppression to adaptively suppress low-confident localization in camera-based
system with respect to object class and distance. In addition, our method employs several techniques to boost the
performance. Our final model achieves 52.45 points mIoU
without using external data and wins 2nd place in CVPR
2023 3D Occupancy Prediction Challenge.