An efficient fusion-based deep learning framework for land use and land cover image clustering

2026-03-15

Tai Dinh, Dat Tran, Zdena Dobešová, Huynh Van Hong, Daniil Lisik, Rameesh Khan,
An efficient fusion-based deep learning framework for land use and land cover image clustering,
Engineering Applications of Artificial Intelligence,
Volume 161, Part B,
2025,
112061,
ISSN 0952-1976,
https://doi.org/10.1016/j.engappai.2025.112061.
(https://www.sciencedirect.com/science/article/pii/S095219762502069X)
Abstract: Land use and land cover (LULC) analysis is vital for understanding spatial dynamics and informing environmental management, urban planning, and sustainable development. Traditional approaches, such as manual surveys and conventional image clustering methods, often face limitations in scalability and adaptability. This paper presents a novel deep learning framework that combines the Vision Transformer (ViT) and Variational Autoencoder (VAE) to extract complementary feature representations for LULC image clustering. The ViT tokenizes image patches to capture high-level semantic features, while the VAE models latent structures to integrate contextual and structural information. To further improve clustering performance, the framework incorporates Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction followed by k-means++ clustering, enabling a scalable and robust solution for diverse datasets. Experiments on multiple datasets, including the Urban Atlas LULC 2018 dataset and recent LULC maps of Japan and Vietnam, demonstrate the framework’s superior ability to capture complex LULC patterns compared to traditional methods. The datasets and source code will be made publicly available at https://github.com/ClarkDinh/LULCMiner. This framework has broad applications across geospatial and remote sensing engineering, civil and environmental engineering, agricultural planning, transportation, and urban development.
Keywords: Artificial intelligence; Land use and land cover; Urban land use; Deep image clustering; Transformer; Vision transformer; Variational autoencoder; Uniform manifold approximation and projection for dimension reduction (UMAP); K-means++