42dot Inc. has presented the solution referred to as MiLO which won the 2nd place (honorable runner-up) in the fiercely contested 3D Occupancy Prediction Challenge for autonomous driving at the Computer Vision and Pattern Recognition Conference (CVPR) 2023; in Vancouver, Canada. This is the first worldwide competition for occupancy prediction, yet being one the most competitive tracks, with almost 150 participating teams from 10 regions. Notably, 42dot's solution does not require access to external data or very large-scale models (models with millions of parameters).
3D occupancy prediction is promising for safe and robust autonomous driving systems. 3D occupancy prediction divides the 3D scenes into a grid of 3D voxels and then estimates voxel occupancy states which are occupied, free, and unobserved. The occupied voxels are further categorized with semantic predictions (such as car, pedestrian, bicycle, etc). Occupancy Prediction addresses the limitations of the conventional object detection approach in two important aspects. First, object detection assign objects of interest with bounding boxes that do not capture the geometric details of the objects. Second, object detectors typically detect objects of interest and ignore uncommon categories. For example, object detectors for car detection will ignore a kangaroo on the street, which is a serious problem in practice. 3D occupancy prediction preserves 3D geometric information and takes the uncommon categories into consideration via occupied/free states.
In CVPR 2023 occupancy prediction challenge, team 42dot focuses on optimizing the AI model with constraint data and resources. The proposed method is referred to as Multi-task Learning with Localization Ambiguity Suppression for Occupancy Prediction (MiLO). The method is unique in two important points. First, varying-depth multi-task learning is proposed for task synergy and easing deep network training. Second, localization ambiguity suppression is proposed to adaptively filter out low-confident prediction based on object class and distance. The final model also consists of different techniques for performance improvement and achieves 52.45 points on the challenge leaderboard without external data or large-scale models. MiLO is expected to provide additional insight for accurate 3D occupancy prediction for safe and robust autonomous driving.
For more information, check out the technical report and presentation video as below.
- https://opendrivelab.com/e2ead/AD23Challenge/Track_3_42dot.pdf
- https://www.youtube.com/watch?v=HyTojp5bSxA
Thang Vu | AD algorithm
I am in charge of developing perception models from 2D/3D data for autonomous vehicles.