Machine learning modelling of sonochemical systems using physically-derived dimensionless groups

2025-11-24

Yucheng Zhu, Ruosi Zhang, Xueliang Zhu, Xuhai Pan, Michael Short, Lian X. Liu, Madeleine J. Bussemaker,
Machine learning modelling of sonochemical systems using physically-derived dimensionless groups,
Ultrasonics Sonochemistry,
Volume 122,
2025,
107593,
ISSN 1350-4177,
https://doi.org/10.1016/j.ultsonch.2025.107593.
(https://www.sciencedirect.com/science/article/pii/S1350417725003724)
Abstract: Sonochemistry involves complex multiparametric effects and nonlinear interactions that challenge conventional analysis and modelling approaches, especially when extrapolating across systems. Current models mainly depend on dimensional input variables, limiting generalisability and interpretability. This work proposes a machine learning strategy that integrates physically derived dimensionless variables (Π-terms) into a categorical boosting (CatBoost) algorithm to overcome these limitations. Four representative sonochemical outputs, namely sonochemiluminescence (SCL) intensity, SCL area, and ultrasonic oxidation from iodide oxidation radicals (IORS) and both IORS and H2O2, were selected as model targets. Seven supervised learning algorithms, including k-nearest neighbours (KNN), linear regression, support vector regression (SVR), random forest, gradient boosting, extreme gradient boosting (XGBoost), and CatBoost, were evaluated, with tree-based models exhibiting superior performance. CatBoost was finally selected as the baseline model. Regression models using the same Π-terms achieved R2 = 0.67–0.90 on the full dataset but required dataset-specific corrections to predict independent validation sets. However, the machine learning framework reached higher predictive accuracy (R2 = 0.87––0.95 on the reserved test set) and generalised to external validation datasets without additional corrections. Furthermore, a direct comparison between dimensional and dimensionless input strategies showed that dimensionless-input models provided superior generalisability and task-to-task consistency, alleviating plateau effects observed in dimensional models and yielding more stable feature attributions. SHAP analysis highlighted variables associated with cavitation thermal buffering and energy input scaling (>50 % combined importance across tasks), offering mechanistic insights into these nonlinear behaviours that regression could not capture.
Keywords: Sonochemistry; Machine learning; Dimensionless modelling; CatBoost; Mechanism visualisation