From Local Training to Large-Scale Mapping: A Comparative Assessment of Machine Learning and Deep Learning for Transferable Satellite-Derived Bathymetry

Researchers from multiple institutions evaluated machine learning and deep learning models for satellite-derived bathymetry using Sentinel-2 imagery, finding that preserving spatial continuity during training was the most critical design choice for accurate depth mapping. The study showed that convolutional neural networks maintained robust performance across different regions with root mean square error as low as 0.26 meters for depths under 3 meters, while Random Forest models degraded sharply when transferred to new areas. The team released optimized architectures and pretrained weights to enable scalable bathymetry mapping in optically complex coastal environments.

arXiv:2606.02764v1 Announce Type: new Abstract: Satellite-derived bathymetry SDB from multispectral imagery is cost-effective but scales poorly across regions, especially in optically complex coastal environments. We evaluate machine learning and deep learning for transferable SDB over the 0-20 m depth range using Sentinel-2 imagery. A Random Forest baseline and four CNNs ResNet-50, ResNet-101, EfficientNet-B4, ConvNeXt-Large are trained on Pratas Island and selected Great Barrier Reef regions, then evaluated on spatially independent intra- and cross-regional test areas. Preserving spatial continuity during training, by keeping contiguous reef blocks rather than random patches, is the single most impactful design choice; we further introduce a Smooth Weight Function SWF -weighted RMSE loss that emphasizes near-surface depths. With these choices, intra-regional RMSE ranges from 1.15 to 1.92 m over 0-20 m and is as low as 0.26 m for depths <= 3 m. Random Forest degrades sharply under cross-regional transfer RMSE 1.53 m - 2.99-3.78 m , while the deep models stay more robust 2.46-2.98 m . On the public MagicBathyNet aerial-RGB benchmark 0-16 m the proposed networks reach 0.19-0.22 m RMSE, outperforming a U-Net baseline and a task-specific transformer architecture with substantially fewer parameters. We further exploit multi-temporal repeat imagery: training on it broadens diversity, and median-aggregating predictions across passes at inference reduces noise from changing sun angles, atmospheric conditions, water properties, and tides. We release optimized architectures and pretrained weights to enable scalable transfer to new sites.