Sampling strategies for machine learning-based linear erosion studies: a review approaching contributing area
Tatiane Ferreira Olivatto, José Augusto Di Lollo,
Sampling strategies for machine learning-based linear erosion studies: a review approaching contributing area,
Natural Hazards Research,
2025,
,
ISSN 2666-5921,
https://doi.org/10.1016/j.nhres.2025.08.002.
(https://www.sciencedirect.com/science/article/pii/S2666592125000708)
Abstract: Linear erosion is a major socio-environmental challenge, influenced by climatic, geomorphological and anthropogenic factors. This study explores how sampling strategies and topographic contributing areas impact machine learning-based applications in linear erosion research. A bibliometric analysis was conducted to assess historical and emerging trends at the intersection of linear erosion and machine learning. Additionally, a systematic extraction of key methodological insights was performed using artificial intelligence tools, followed by an integrative literature review focusing on sampling techniques and drainage influence areas. Results show that spatially stratified sampling, particularly with a 1:1.2 occurrence-to-non-occurrence ratio and non-occurrence points placed outside the hydrological contributing area, improves model generalization and environmental representativeness. However, its effectiveness depends on an adequate understanding of local landscape dynamics. Advanced oversampling techniques further mitigate class imbalance, while spatial cross-validation addresses spatial autocorrelation and enhances robustness. The use of topographic contributing area as a predictive variable shows significant potential due to its role in controlling runoff concentration and sediment flow. Nevertheless, the lack of standardization in how contributing areas are delineated and integrated into models limits broader applicability. Current machine learning approaches often underexplore these spatial components, reducing their physical consistency. This work consolidates methodological advances in data preparation and emphasizes physically informed models that integrate land use and hydrological processes, enabling more robust and interpretable machine learning applications in erosion studies. Future research should focus on refining sampling frameworks and integrating contributing area metrics to enhance the robustness and interpretability of machine learning predictions in erosion studies.
Keywords: drainage area; influence area; absence data; spatial sampling; gully; ravine