Explainable machine learning-based cardiovascular disease prediction in patients with hypertension: Algorithm comparison and SHapley Additive exPlanations (SHAP) analysis
Meng Wang,
Explainable machine learning-based cardiovascular disease prediction in patients with hypertension: Algorithm comparison and SHapley Additive exPlanations (SHAP) analysis,
Archives of Cardiovascular Diseases,
2025,
,
ISSN 1875-2136,
https://doi.org/10.1016/j.acvd.2025.09.005.
(https://www.sciencedirect.com/science/article/pii/S1875213625007995)
Abstract: ABSTRACT
Background: Cardiovascular disease is a leading cause of morbidity in patients with hypertension. Developing a highly accurate and interpretable prediction model is crucial for facilitating early intervention. Aim: This study aimed to construct and validate a machine learning-based risk prediction model for cardiovascular disease in patients with hypertension to improve the effectiveness of clinical screening. Methods: The data were sourced from the National Health and Nutrition Examination Survey conducted from 2009 to 2018. We integrated the Least Absolute Shrinkage and Selection Operator, the Boruta algorithm and Recursive Feature Elimination to screen key variables. Four machine learning algorithms were employed to construct the prediction model. The performance of the model was evaluated through 10-fold cross-validation and an independent test set, and the SHapley Additive exPlanations (SHAP) method was used to analyse the feature contribution mechanism. Results: A total of 2781 participants were included, and eight key variables were finally selected for constructing the prediction model. After a comprehensive evaluation, the Balanced Bagging Classifier model demonstrated the best performance. SHAP analysis revealed that the top-ranked features by descending importance were: neutrophil-lymphocyte ratio, waist-to-height ratio, age, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, kidney disease, sleep disturbance and diabetes. Conclusions: The machine learning model developed in this study demonstrates good effectiveness and promising generalizability for predicting cardiovascular disease in patients with hypertension, with its utility further enhanced by high interpretability via SHAP analysis. The model shows promise as a practical tool for early cardiovascular disease screening, risk assessment and clinical decision-making in hypertensive populations. Future research should explore the model’s applicability across diverse populations and clinical scenarios.
Keywords: Cardiovascular disease; Hypertension; Machine learning; Prediction model; SHAP