AI for Maritime Security: Comparative Evaluation of CNN and Vision Transformer Architectures for Maritime Object Detection

Researchers evaluated six deep learning architectures for maritime object detection, including CNNs and a Vision Transformer (ViT), using a dataset of 6,468 images under various weather conditions. The ViT achieved 100% accuracy with the lowest error rates and fastest video processing, highlighting its potential for maritime security applications such as surveillance and autonomous navigation.

arXiv:2606.14720v1 Announce Type: new Abstract: This study aims to enhance maritime security by using advanced Artificial Intelligence AI and Computer Vision CV techniques. For this purpose, it was designed and assessed intelligent object detection systems that can detect the presence of ships on the sea surface under different real-time environments. To achieve this goal, a maritime image dataset with 6,468 images was used, covering different weather conditions like cloudy, foggy, rainy, and sunny environments. Six deep learning architectures were evaluated, including a base Convolutional Neural Network CNN model, four transfer learning models Xception, VGG16, MobileNetV2, and EfficientNetV2L , and a Vision Transformer ViT model. The models were compared using multiple performance indicators, including accuracy, Type I and Type II errors, model size, and video processing time. The results show that model performance varies depending on computational constraints and deployment conditions. While lightweight architectures are suitable for resource-limited devices, the ViT achieved the best overall performance, reaching 100% accuracy with the lowest error rates and the fastest video processing time. The findings highlight the potential of AI-driven computer vision systems for maritime surveillance, border protection, and autonomous navigation.