2D and 3D QSAR-Based Machine Learning Models for Predicting Pyrazole Corrosion Inhibitors for Mild Steel in HCl
H. El-Idrissi, M. Lahyaoui, B. Hachlaf, M. El Yaqoubi, A. Elabbadi, H. Atlas, B. Ihssane, A. Haoudi, A. Mazzah, M. Sfaira, A. Zarrouk, T. Saffaj,
2D and 3D QSAR-Based Machine Learning Models for Predicting Pyrazole Corrosion Inhibitors for Mild Steel in HCl,
Scientific African,
2025,
e03069,
ISSN 2468-2276,
https://doi.org/10.1016/j.sciaf.2025.e03069.
(https://www.sciencedirect.com/science/article/pii/S2468227625005381)
Abstract: In this study, four machine learning data analysis models based on Support Vector Regression (SVR), Categorical Boosting Regression (CatBoost), Extreme Gradient Boosting (XGBoost), and Backpropagation Artificial Neural Network (BPANN) have been proposed to model the inhibitory efficacy of 52 pyrazole derivative molecules on mild steel in an acidic HCl medium. These models are developed using twenty-one 2D descriptors selected through the Select KBest approach and 3D molecular descriptors. The XGBoost model demonstrates strong predictive ability for both the training set (R² = 0.96, R² = 0.94) and the test set (R² = 0.75, R² = 0.85) for the 2D and 3D descriptors, respectively, with RMSE < 2.84. Additionally, the residual analysis and Williams' plot suggest that this model effectively predicts the inhibition efficiency of novel molecules. Furthermore, the SHAP analysis identified the key descriptors influencing the model's predictions, thereby confirming the relevance of the variables selected in the QSAR study. This local and global interpretability strengthens the model’s validity by providing mechanistic insights into structure–activity relationships. Overall, this study makes a significant contribution to the modeling of inhibitory efficiency, strengthening predictive approaches to corrosion control in acidic media.
Keywords: pyrazole derivatives; XGBoost; machine learning; Williams' plot; residual analysis