Zero-shot multi-modal large language models v.s. supervised deep learning: A comparative analysis on CT-based intracranial hemorrhage subtyping
Yinuo Wang, Kai Chen, Yue Zeng, Cai Meng, Chao Pan, Zhouping Tang,
Zero-shot multi-modal large language models v.s. supervised deep learning: A comparative analysis on CT-based intracranial hemorrhage subtyping,
Brain Hemorrhages,
Volume 6, Issue 6,
2025,
Pages 323-330,
ISSN 2589-238X,
https://doi.org/10.1016/j.hest.2025.10.004.
(https://www.sciencedirect.com/science/article/pii/S2589238X25000919)
Abstract: Objective
Accurate identification of intracranial hemorrhage (ICH) subtypes on non-contrast CT is crucial for prognosis and treatment but remains challenging due to low contrast and blurred boundaries. This study evaluates the zero-shot performance of multi-modal large language models (MLLMs) versus traditional deep learning in ICH detection and subtyping.
Methods
Using 192 NCCT volumes from the RSNA dataset, we compared MLLMs (GPT-4o, Gemini 2.0 Flash, Claude 3.5 Sonnet V2) with deep learning models (ResNet50, Vision Transformer). MLLMs were prompted for ICH presence, subtype, localization, and volume estimation.
Results
Traditional deep learning models outperformed MLLMs in both ICH detection and subtyping. For subtyping, MLLMs showed lower accuracy, with Gemini 2.0 Flash achieving a macro-averaged precision of 0.41 and F1 score of 0.31.
Conclusion
While MLLMs offer enhanced interpretability through language-based interaction, their accuracy in ICH subtyping remains inferior to deep learning networks. Further optimization is needed to improve their utility in three-dimensional medical imaging.
Keywords: Intracranial hemorrhage subtyping; Multi-modal large language models; Medical image classification; Validation