A Hybrid Deep Learning Approach Using Vision Transformer and U-Net for Flood Segmentation
Cyreneo Dofitas, Yong-Woon Kim, Yung-Cheol Byun,
A Hybrid Deep Learning Approach Using Vision Transformer and U-Net for Flood Segmentation,
Computers, Materials and Continua,
Volume 86, Issue 2,
2025,
Pages 1-19,
ISSN 1546-2218,
https://doi.org/10.32604/cmc.2025.069374.
(https://www.sciencedirect.com/science/article/pii/S1546221825012512)
Abstract: Recent advances in deep learning have significantly improved flood detection and segmentation from aerial and satellite imagery. However, conventional convolutional neural networks (CNNs) often struggle in complex flood scenarios involving reflections, occlusions, or indistinct boundaries due to limited contextual modeling. To address these challenges, we propose a hybrid flood segmentation framework that integrates a Vision Transformer (ViT) encoder with a U-Net decoder, enhanced by a novel Flood-Aware Refinement Block (FARB). The FARB module improves boundary delineation and suppresses noise by combining residual smoothing with spatial-channel attention mechanisms. We evaluate our model on a UAV-acquired flood imagery dataset, demonstrating that the proposed ViT-UNet+FARB architecture outperforms existing CNN and Transformer-based models in terms of accuracy and mean Intersection over Union (mIoU). Detailed ablation studies further validate the contribution of each component, confirming that the FARB design significantly enhances segmentation quality. To its better performance and computational efficiency, the proposed framework is well-suited for flood monitoring and disaster response applications, particularly in resource-constrained environments.
Keywords: Flood detection; vision transformer (ViT); U-Net segmentation; image processing; deep learning; artificial intelligence