Predictors of childhood vaccination uptake in England: an explainable machine learning analysis of regional data (2021–2024)
Amin Noroozi, Sidratul Muntaha Esha, Mansoureh Ghari,
Predictors of childhood vaccination uptake in England: an explainable machine learning analysis of regional data (2021–2024),
Vaccine,
Volume 68,
2025,
127902,
ISSN 0264-410X,
https://doi.org/10.1016/j.vaccine.2025.127902.
(https://www.sciencedirect.com/science/article/pii/S0264410X25011995)
Abstract: Background
Childhood vaccination is a cornerstone of public health, yet disparities in vaccination coverage persist across England. These disparities arise from complex interactions among geographic, demographic, socioeconomic, and cultural (GDSC) factors. While previous studies have explored these predictors, most have relied on cross-sectional data and traditional statistical methods, limiting the ability to capture the dynamic and multivariate nature of vaccine uptake.
Methods
We conducted a machine learning analysis of childhood vaccination coverage across 150 districts in England from 2021 to 2024. Vaccination data from NHS records were used to group districts into low and high coverage clusters through hierarchical clustering, considering two, three, and six clusters. A CatBoost classifier was trained to predict vaccination clusters using GDSC variables, and the SHapley Additive Explanations (SHAP) method was applied to identify and interpret key predictors.
Results
Clustering into two clusters, representing low and high coverage, was identified as optimal based on dendrogram analysis. Using these two clusters, the CatBoost model achieved accuracies of 92.1 %, 90.6 %, and 86.3 % in predicting district vaccination coverage across the three respective years. SHAP analysis revealed that geographic, cultural, and demographic variables, particularly rurality, English language proficiency, foreign-born status, and ethnic composition, were the most influential predictors. Contrary to expectations, rural districts were significantly more likely to have higher vaccination coverage. Districts with lower coverage had significantly higher populations of non-native English speakers, foreign-born residents, and ethnic minorities. Socioeconomic factors such as deprivation and employment consistently showed lower importance, particularly in 2023–2024.
Conclusions
The findings suggest that vaccination disparities in England are primarily driven by geographic, demographic, and cultural rather than socioeconomic factors. The application of explainable machine learning methods offers actionable insights for public health planning, particularly for identifying and supporting vulnerable communities. The datasets collected and used in this study are available at https://github.com/AminNoroozi/.
Keywords: Childhood vaccination; Geographic and demographic disparities; Public health; Machine learning; Explainable artificial intelligence