Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection

A team of researchers improved a DETR-based fusion transformer for the MaCVi 2026 Vision-to-Chart data association challenge by adding a dedicated MLP, QueryMLP, to explicitly predict a buoy's waterline contact point in the image from chart and IMU data. This direct spatial prior reduced the geometric reasoning burden on the transformer decoder, achieving an Overall score of 0.7386, F1 of 0.8055, and mIoU of 0.6718 on the held-out test set. The modification placed second among all submissions, demonstrating a lightweight method to improve vision-to-chart buoy association.

arXiv:2605.22942v1 Announce Type: new Abstract: This report presents a lightweight modification to the DETR-based fusion transformer baseline for the MaCVi 2026 Vision-to-Chart data association challenge. The challenge baseline decoder receives per-buoy queries encoding world-space distance and bearing, forcing the transformer to implicitly learn the complex geometric projection from world coordinates to image pixels. Instead, this work trains an additional dedicated MLP, QueryMLP, to explicitly predict the buoy's waterline contact point in the image from chart measurements and IMU orientation data. The predicted pixel coordinates are appended to the baseline decoder query vector, providing a direct spatial prior per buoy and reducing the geometric reasoning burden on the transformer decoder. On the challenge leaderboard, the presented approach achieves an Overall score of 0.7386, with F1 = 0.8055 and mIoU = 0.6718, on the held-out test set, placing second among all submissions.