A feature engineering technique for enhancing the generalization of machine learning models in estimating crop evapotranspiration

2025-11-20

Gaku Yokoyama, Sohta Harigai, Shigehiro Kubota, Koichi Nomura, Gregory R. Goldsmith, Daisuke Yasutake, Tomoyoshi Hirota, Masaharu Kitano,
A feature engineering technique for enhancing the generalization of machine learning models in estimating crop evapotranspiration,
Agricultural Water Management,
Volume 320,
2025,
109854,
ISSN 0378-3774,
https://doi.org/10.1016/j.agwat.2025.109854.
(https://www.sciencedirect.com/science/article/pii/S0378377425005682)
Abstract: Accurate and precise estimation of evapotranspiration (ET) is crucial for understanding the terrestrial carbon, water, and energy cycles. While process-based models of ET, such as the Penman–Monteith model offer robust generalization capabilities, they are limited by the need for detailed parameters (e.g., stomatal conductance,) that are challenging to measure continuously. On the other hand, machine learning models can estimate ET by capturing relationships between ET and environmental variables without experimentally measuring model parameters. However, machine learning models face the challenge of limited generalizability. This issue is particularly significant given the uncertainty introduced by changing climatic conditions, which can restrict the model's predictive performance when it is applied to different environmental contexts. Therefore, we propose a hybrid modeling approach that combines feature engineering using process-based models with machine learning to improve generalizability while maintaining practicality. Our model first converts environmental variables into leaf-scale ET using mechanistic process-based models and then uses these features along with the leaf area index to estimate the canopy-scale ET using an artificial neural network (ANN). We evaluated the generalization of the hybrid model against a pure ANN model using FLUXNET2015 data. Results show that the hybrid model significantly outperformed the pure ANN model, especially when tested on data beyond the range of the training dataset. Furthermore, the estimation accuracy of the hybrid model was stable even when the values of the model parameters in the process-based models used for feature engineering were varied by ±50 %. This indicates that incorporating a mechanistic understanding of plant environmental responses enhances the generalizability and robustness of ET predictions. These findings underscore the potential of hybrid models to combine the strengths of process-based and machine learning approaches.
Keywords: Eddy covariance; FLUXNET; Hybrid model; Transpiration; Water cycle